image processing to improve tesseract OCR accuracy

前端 未结 13 1652
鱼传尺愫
鱼传尺愫 2020-11-22 14:41

I\'ve been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I\'m looking for tips on what sort of image processing might impr

13条回答
  •  伪装坚强ぢ
    2020-11-22 15:13

    What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.

    BTW: Kudos to it's author for sharing such a painstaking algorithm.

    Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.

    If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.

    P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.

提交回复
热议问题