@dansup @pixelfed You may be interested in the International Standard Content Code (ISCC) (introduction: https://iscc.codes or specification https://core.iscc.codes), which will become an open standard for decentralised digital content identification. Anyone can generate on premise a mix of cryptographic and similarity preserving hashes, which allows to match near-duplicate content (of all media types and formats, btw.) only by comparing the ISCC codes.