image processing to improve tesseract OCR accuracy

前端 未结 13 1670
鱼传尺愫
鱼传尺愫 2020-11-22 14:41

I\'ve been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I\'m looking for tips on what sort of image processing might impr

相关标签:
13条回答
  • 2020-11-22 15:34
    1. fix DPI (if needed) 300 DPI is minimum
    2. fix text size (e.g. 12 pt should be ok)
    3. try to fix text lines (deskew and dewarp text)
    4. try to fix illumination of image (e.g. no dark part of image)
    5. binarize and de-noise image

    There is no universal command line that would fit to all cases (sometimes you need to blur and sharpen image). But you can give a try to TEXTCLEANER from Fred's ImageMagick Scripts.

    If you are not fan of command line, maybe you can try to use opensource scantailor.sourceforge.net or commercial bookrestorer.

    0 讨论(0)
提交回复
热议问题