I have many PDF documents in my system, and I notice sometimes that documents are image-based without editing capability. In this case, I do OCR for better search in Foxit Phan
Being late for the party, here's a simple solution implying that pdf files already containing fonts aren't image based only:
find ./ -name "*.pdf" -print0 | xargs -0 -I {} \
bash -c 'export file="{}"; \
if [ $(pdffonts "$file" 2> /dev/null | \
wc -l) -lt 3 ]; then echo "$file"; fi'
As one-liner
find ./ -name "*.pdf" -print0 | xargs -0 -I {} bash -c 'export file="{}"; if [ $(pdffonts "$file" 2> /dev/null | wc -l) -lt 3 ]; then echo "$file"; fi'
Explanation:
pdffonts file.pdf
will show more than 2 lines if pdf contains text.
Outputs filenames of all pdf files that don't contain text.
My OCR project which has the same feature is in Github deajan/pmOCR.