A while ago, I spent some time searching for ways to determine whether two images are identical in order to answer this question. I now face a slightly different problem: I have
I think that the general answer to this question calls for an unsupervised machine learning approach that generates local invariant features - basically, a fancy way of finding hashes that don't change with scaling or rotation - and then running a clustering algorithm. Here are some papers that might be relevant:
Well i think dHash is something you need for this. You just have to improve dHash to take into consideration of rotation, that means 2000 images will be considered as 8000 images.
I wrote a pure java library just for this few days back. You can feed it with directory path(includes sub-directory), and it will list the duplicate images in list with absolute path which you want to delete. Alternatively, you can use it to find all unique images in a directory too.
It used awt api internally, so can't be used for Android though. Since, imageIO has problem reading alot of new types of images, i am using twelve monkeys jar which is internally used.
https://github.com/srch07/Duplicate-Image-Finder-API
Jar with dependencies bundled internally can be downloaded from, https://github.com/srch07/Duplicate-Image-Finder-API/blob/master/archives/duplicate_image_finder_1.0.jar
The api can find duplicates among images of different sizes too.