问题
I'm using Curl via Proxies to download images with a scraper I have developed.
Unfortunately, it gets the odd image which looks like these and the last one is completely blank :/
- When I test the images via imagemagick (using identify) it tells me they are valid images.
- When I test the images via exif_imagetype() and imagecreatefromjpeg() again, both these functions tell me the images are valid.
Does anyone have a way to determine if the image has majority of greyness or is completely blank/white and these are indeed corrupted images?
I have done a lot of checking with other questions on here, but I haven't had much luck with other solutions. So please take care in suggesting this is a duplicate.
Thanks
After knowing about imgcolorat, I did a search and stumbled on some code. I came up with this:
<?php
$file = dirname(__FILE__) . "/images/1.jpg";
$img = imagecreatefromjpeg($file);
$imagew = imagesx($img);
$imageh = imagesy($img);
$xy = array();
$last_height = $imageh - 5;
$foo = array();
$x = 0;
$y = 0;
for ($x = 0; $x <= $imagew; $x++)
{
for ($y = $last_height;$y <= $imageh; $y++ )
{
$rgb = @imagecolorat($img, $x, $y);
$r = ($rgb >> 16) & 0xFF;
$g = ($rgb >> 8) & 0xFF;
$b = $rgb & 0xFF;
if ($r != 0)
{
$foo[] = $r;
}
}
}
$bar = array_count_values($foo);
$gray = (isset($bar['127']) ? $bar['127'] : 0) + (isset($bar['128']) ? $bar['128'] : 0) + (isset($bar['129']) ? $bar['129'] : 0);
$total = count($foo);
$other = $total - $gray;
if ($gray > $other)
{
echo "image corrupted \n";
}
else
{
echo "image not corrupted \n";
}
?>
Anyone see some potential pitfalls with this? I thought about getting the last few rows of the image and then comparing the total of r 127,128,129 (which are gray) against the total of other colours. If gray is greater than the other colours then the image is surely corrupted.
Opinions welcome! :)
回答1:
If the image it is returning is a valid file, then I would recommend running the scrape twice (ie. download it twice and check to see if they are the same).
Another option would be to check the last few pixels of the image (ie. bottom-right corner) to see if they match that color of grey exactly. If they do, then redownload. (obviously this approach fails if you download an image that is actually supposed to be grey in that corner, in that exact colour...but if you check several of the last pixels it should reduce the chance of that to an acceptable level).
回答2:
found this page when looking for a way to check visually corrupted images like this. Here is a way to solve the problem using bash (anyway, the convert command line can be easily adapted for php or python) :
convert INPUTFILEPATH -gravity SouthWest -crop 20%x1% -format %c -depth 8 histogram:info:- | sed '/^$/d' | sort -V | head -n 1 | grep fractal | wc -l
It crops a little square in the southwest corner of the picture, then gets the histogram of this picture. If the main color of the histogram has the name "fractal" instead of an rgb color, it means this zone is corrupted and so the output will be 1
and 0
otherwise.
Hope this helps!
回答3:
I use this one. If the most of pixels in right bottom corner (5x5) are grey, then image is broken.
define('MIN_WIDTH',500);
define('MIN_HEIGHT',200);
function isGoodImage($fn){
list($w,$h)=getimagesize($fn);
if($w<MIN_WIDTH || $h<MIN_HEIGHT) return 0;
$im=imagecreatefromstring(file_get_contents($fn));
$grey=0;
for($i=0;$i<5;++$i){
for($j=0;$j<5;++$j){
$x=$w-5+$i;
$y=$h-5+$j;
list($r,$g,$b)=array_values(imagecolorsforindex($im,imagecolorat($im,$x,$y)));
if($r==$g && $g==$b && $b==128)
++$grey;
}
}
return $grey<12;
}
回答4:
ImageMagick's identify
command will identify far more corrupt images if you call it with the -verbose
option. And there's a -regard-warnings
option as well, which will make it treat warnings as errors. Try these against a bad image, and see if the result is a non-zero error code.
来源:https://stackoverflow.com/questions/8995096/php-determine-visually-corrupted-images-yet-valid-downloaded-via-curl-with-gd