I have a large amount of JPEG thumbnail images ranging in size from 120x90 to 320x240 and I would like to classify them as either Real Life-like or Cartoon-like.
How mig
I guess your best bit is the radio between histogram and number of pixel. A cartoon-line image trend to have less number of color then the real-life one.
You can use
COLORS=`convert picture.jpg -format %c histogram:info:- | wc -l`
to count how many colors the picture have. And use a command like:
WIDTH=`jpeginfo picture.jpg | sed -r "s/.* ([0-9]+) x.*/\1/"`
and
HEIGHT=`jpeginfo picture.jpg | sed -r 's/.*x ([0-9]+) .*/\1/'`
to extract width and height.
Then use this command to find the ratio:
echo $WIDTH $HEIGHT $COLORS | awk '{ print $3/($1 * $2);}'
Then it is up to you to define what ratio is qualified as cartoon-like and what is not. For Cartoon-like, the ratio is mostly lower than the real-life one.
Just a thought.
EDIT: I just saw your comment that you don't want to know how just an exiting one. So just ignore my answer then.
EDIT 2: I modify it a bit to make it easier to see.
NOTE 1: You should notice that I swap the ratio as the number of pixels is always much bigger than the number of colors so the previous program results in a lower number. That is why you can hardly distinguish them.
NOTE2: I also change from "jpeginfo" to "identity" as jpeginfo
can only do jpg and it is not a part of ImageMagick.
~/test/CheckCartoon.sh
#!/bin/sh
IMAGE=$1
COLORS=convert $IMAGE -format %c histogram:info:- | wc -l
WIDTH=identify $IMAGE | sed -r "s/.* ([0-9]+)x[0-9]+ .*/\1/"
HEIGHT=identify $IMAGE | sed -r 's/.* [0-9]+x([0-9]+) .*/\1/'
RATIO=echo $WIDTH $HEIGHT $COLORS | awk '{ print ($1 * $2)/$3;}'
echo $RATIO | awk '{ printf "%020.5f",$1 }'
~/test/CheckAll.sh
#!/bin/sh
cd images
FILES=ls
for FILE in $FILES; do
IsIMAGE=identify $FILE 2>&1 | grep " no decode delegate " | grep -o "no"
if [ "$IsIMAGE" = "no" ]; then continue; fi
IsIMAGE=`identify $FILE 2>&1 | grep " Improper image header " | grep -o "Improper"`
if [ "$IsIMAGE" = "Improper" ]; then continue; fi
echo `.././CheckCartoon.sh $FILE` $FILE
done
cd ..
Now for testing you copy files here.
Pic 1: ~/test/images/Cartoon-01.jpg
Pic 2: ~/test/images/Cartoon-02.png
Pic 3: ~/test/images/Cartoon-03.gif
Pic 4: ~/test/images/Real-01.jpg
Pic 5: ~/test/images/Real-02.jpg
Pic 6: ~/test/images/Real-03.jpg
http://dl.getdropbox.com/u/1961549/StackOverflow/SO1518347/Images.png
Then I run ./CheckAll.sh | sort
(in test
folder). Here is want I got.
00000000000003.31362 Real-03.jpg
00000000000004.61574 Real-02.jpg
00000000000009.89920 Cartoon-01.jpg
00000000000013.05870 Real-01.jpg
00000000000020.55470 Cartoon-03.gif
00000000000032.21900 Cartoon-02.png
As you can see the result is generally good. You can use number like 15 as a separation.
Cartoon-01.jpg
is a drawing but it looks like a quite realistic one so it may be easily confused. Also Real-01.jpg
is a picture of my girlfriend standing in front of an ocean so the number of colors is less than usual. This come to no surprise why the confusion happens.
What I show you here is still a raw theory. If you really want a conclusive indication you may have to find number of metrics and compare them. For example, the degree of local contrast.
Hope this will helps.