I\'m looking to create a base table of images and then compare any new images against that to determine if the new image is an exact (or close) duplicate of the base.
<
My company has about 24million images come in from manufacturers every month. I was looking for a fast solution to ensure that the images we upload to our catalog are new images.
I want to say that I have searched the internet far and wide to attempt to find an ideal solution. I even developed my own edge detection algorithm.
I have evaluated speed and accuracy of multiple models.
My images, which have white backgrounds, work extremely well with phashing. Like redcalx said, I recommend phash or ahash. DO NOT use MD5 Hashing or anyother cryptographic hashes. Unless, you want only EXACT image matches. Any resizing or manipulation that occurs between images will yield a different hash.
For phash/ahash, Check this out: imagehash
I wanted to extend *redcalx'*s post by posting my code and my accuracy.
What I do:
from PIL import Image
from PIL import ImageFilter
import imagehash
img1=Image.open(r"C:\yourlocation")
img2=Image.open(r"C:\yourlocation")
if img1.width
Here are some of my results:
item1 item2 totalsimilarity
desk1 desk1 3
desk1 phone1 22
chair1 desk1 17
phone1 chair1 34
Hope this helps!