I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one
A very low tech solution: any file that has scanned text will undoubtedly contain the letter "a" so do a search on all file contents that don't contain the letter a. i.e. "NOT a". Any file that shows up won't have been OCR'd