How to know if a PDF contains only images or has been OCR scanned for searching?

前端 未结 7 1966
借酒劲吻你
借酒劲吻你 2020-12-08 10:35

I have a bunch of PDF files that came from scanned documents. The files contain a mix of images and text. Some were scanned as images with no OCR, so each PDF page is one

相关标签:
7条回答
  • 2020-12-08 11:14

    A very low tech solution: any file that has scanned text will undoubtedly contain the letter "a" so do a search on all file contents that don't contain the letter a. i.e. "NOT a". Any file that shows up won't have been OCR'd

    0 讨论(0)
提交回复
热议问题