image processing to improve tesseract OCR accuracy

前端未结

关注

 13  1652

鱼传尺愫 2020-11-22 14:41

I\'ve been using tesseract to convert documents into text. The quality of the documents ranges wildly, and I\'m looking for tips on what sort of image processing might impr

13条回答

伪装坚强ぢ (楼主)

2020-11-22 15:13

What was EXTREMLY HELPFUL to me on this way are the source codes for Capture2Text project. http://sourceforge.net/projects/capture2text/files/Capture2Text/.

BTW: Kudos to it's author for sharing such a painstaking algorithm.

Pay special attention to the file Capture2Text\SourceCode\leptonica_util\leptonica_util.c - that's the essence of image preprocession for this utility.

If you will run the binaries, you can check the image transformation before/after the process in Capture2Text\Output\ folder.

P.S. mentioned solution uses Tesseract for OCR and Leptonica for preprocessing.

0 讨论(0)

查看其它13个回答
发布评论:

提交评论
- 加载中...