问题
I'd like to find and remove an image in a series of folders. The problem is that the image names are not necessarily the same.
What I did was to copy an arbitrary string from the images bytecode and use it like
grep -ir 'YA'uu�KU���^H2�Q�W^YSp��.�^H^\^Q��P^T' .
But since there are thousands of images this method lasts for ever. Also, some images are created by imagemagic of the original, so can not use size to find them all.
So I'm wondering what is the most efficient way to do so?
回答1:
Updated Answer
If you have the checksum of a specific file in mind that you want to compare with, you can checksum all files in all subdirectories and find the one that is the same:
find . -name \*.jpg -exec bash -c 's=$(md5 < {}); echo $s {}' \; | grep "94b48ea6e8ca3df05b9b66c0208d5184"
Or this may work for you too:
find . -name \*.jpg -exec md5 {} \; | grep "94b48ea6e8ca3df05b9b66c0208d5184"
Original Answer
The easiest way is to generate an md5 checksum once for each file. Depending on how your md5
program works, you would do something like this:
find . -name \*.jpg -exec bash -c 's=$(md5 < {}); echo $s {}' \;
94b48ea6e8ca3df05b9b66c0208d5184 ./a.jpg
f0361a81cfbe9e4194090b2f46db5dad ./b.jpg
c7e4f278095f40a5705739da65532739 ./c.jpg
Or maybe you can use
md5 -r *.jpg
94b48ea6e8ca3df05b9b66c0208d5184 a.jpg
f0361a81cfbe9e4194090b2f46db5dad b.jpg
c7e4f278095f40a5705739da65532739 c.jpg
Now you can use uniq
to find all duplicates.
来源:https://stackoverflow.com/questions/34747987/how-to-find-duplicated-jpgs-by-content