问题
Given two images:
image1.jpg
image2.jpg
What's a fast way to detect if they are visually identical in Python? For example, they may have different EXIF data which would yield different checksums, even though the image data is the same).
Imagemagick has an excellent tool, "identify," that produces a visual hash of an image, but it's very processor intensive.
回答1:
Using PIL/Pillow:
from PIL import Image
im1 = Image.open('image1.jpg')
im2 = Image.open('image2.jpg')
if list(im1.getdata()) == list(im2.getdata()):
print "Identical"
else:
print "Different"
回答2:
I'm still submitting my way to tackle this -- even if the OP says that ImageMagick's way is too processor intensive (and even though my way does not involve Python)... Maybe my answer is useful to other people then, arriving at this page via search engine.
Be aware that any image comparison which is supposed to discover fine differences in hi-res images is more processor intensive than a discovery of big differences in low-res images, as it has to compare a lot more pixels.
Visualization of Differences
Here is an ImageMagick command that compares two (same-sized!) images, and returns all differing pixels as red, identical pixels as white. The first one has the reference image as a faded out background image for the composition of the red-white pixel matrix. .img
may be any of the IM-supported formats (.png, .PnG, .pNG, .PNG, .jpg, .jpeg, .jPeG, .tif, .tiff, .ppm, .gif, .pdf, ...):
compare reference.img similar.img delta.img
compare reference.img similar.img -compose src delta.img
By default, the comparison is made at 72 PPI. If you need more resolution (like, with a vector based image, such as a PDF page), you can add -density
to increase it. Of course, the processing time will increase accordingly:
compare -density 300 reference.img similar.img delta.img
If you add a fuzz factor, you can tell ImageMagick to treat all pixels as identical which are no more than a certain color distance apart:
compare -fuzz '3%' reference.img similar.img -compose src delta.img
pHash-ed difference value
More recent versions of ImageMagick support the phash
algorithm:
compare -metric phash reference.img similar.img -compose src delta.img
This will, besides creating the delta.img
for visualization, return a numeric value that indicates the "difference" between two images. The closer it is to 0
, the more similar are the two images compared.
Examples:
Create a few small PDF pages with minor differences in them. I'm using Ghostscript:
gs -o ref1.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (0) show showpage"
gs -o ref2.pdf -sDEVICE=pdfwrite -g1050x1350
-c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (0) show showpage"
gs -o ref3.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.0 10.0 moveto (O) show showpage"
gs -o ref4.pdf -sDEVICE=pdfwrite -g1050x1350 \
-c "/Courier findfont 160 scalefont setfont 10.1 10.1 moveto (O) show showpage"
Now compare ref1.pdf
with ref3.pdf
at the default resolution of 72 PPI:
compare -metric phash ref1.pdf ref3.pdf delta-ref1-ref3.pdf
7.61662
The returned pHash value is 7.61662
. This indicates that ImageMagick's compare
discovered at least some differences.
Let's look at the visualization. I'll create a side-by-side visualization of the three PDFs/images (to be shown below):
convert \
-mattecolor blue \
\( ref1.pdf -frame 2x2 \) \
null: \
\( ref3.pdf -frame 2x2 \) \
null: \
\( delta-ref1-ref3.pdf -frame 2x2 \) \
+append \
ref1-ref3-delta.png
As you can see, the different shapes of the 0
(digit 'zero') and the O
(letter o
, capital version) are standing out quite well.
Now the next one: where ref1.pdf
is compared to ref2.pdf
, also at 72 PPI.
compare -metric phash ref1.pdf ref2.pdf delta-ref1-ref2.pdf
0
The returned pHash value now is 0
. This indicates that ImageMagick discovered no difference!
Create a side-by-side visualization of the three PDFs/images:
convert \
-mattecolor blue \
\( ref1.pdf -frame 2x2 \) \
null: \
\( ref2.pdf -frame 2x2 \) \
null: \
\( delta-ref1-ref2.pdf -frame 2x2 \) \
+append \
ref1-ref2-delta.png
As you can see, at 72 PPI ImageMagick does not discover a difference between the two PDFs (as would be indicated by red pixels). According to the Ghostscript command, both show the digit 0
, but at positions which are shifted by 0.1 pt apart in x- and y-directions. So in reality, in the original PDF, there IS a difference. But when rendered at 72 PPI, this difference isn't visible.
Let's try to see the difference with density 600
then:
compare \
-metric phash \
-density 600 \
ref1.pdf \
ref2.pdf \
ref1-ref2-at-density600-delta.png
0.00172769
The returned pHash value at 600 PPI now is 0.00172769
. This is close to zero, but still a difference. The difference is less than the one between ref1.pdf
and ref3.pdf
.
The difference is clearly highlighted now in the visual comparison, even though only by a thin line of red pixels:
来源:https://stackoverflow.com/questions/23982960/fast-and-efficient-way-to-detect-if-two-images-are-visually-identical-in-python