How can I recognize slightly modified images?
I have a very large database of jpeg images, about 2 million. I would like to do a fuzzy search for duplicates among those images. Duplicate images are two images that have many (around half) of their pixels with identical values and the rest are off by about +/- 3 in their R/G/B values. The images are identical to the naked eye. It's the kind of difference you'd get from re-compressing a jpeg. I already have a foolproof way to detect if two images are identical: I sum the delta-brightness over all the pixels and compare to a threshold. This method has proven 100% accurate but doing 1 photo