Tesseract has trouble reading this extremely simple string of numbers

偶尔善良 提交于 2019-12-12 12:33:11

问题


I'm currently writing a script in python that requires the use of tesseract to read a number like this:

Using digits only and -psm 6 (or 7) it outputs 5.551

I have had some success with other numbers (5.700 works) but this particular number is giving me a ton of problems. Unfortunately i need a high degree of accuracy for my program but i thought tesseract would be able to decipher such a simple string.

I have also tried to use GOCR and that correctly read 6.881 (yay!) but gave the output 5._00 for 5.700 (boo!)

Any idea why it would be doing this?

Or more importantly, anything i can do to get around the problem ( preferably without having to train tesseract ).


回答1:


I doubled its size and removed the transparency (replacing it with white) using Imagemagick (you can use something else if you want) and Tesseract OCR'd the enhanced image correctly:

$ convert I1Zau.png -background white -flatten -resize 200% I1Zau_2.png
$ tesseract I1Zau_2.png o.txt
$ cat o.txt.txt 
6.881



回答2:


Welcome to the world of OCR! Unfortunately even those simple cases can be problematic for a basic OCR application. One workaround I have used with some success is to actually make your image bigger (using imagemagick) and then feed into Tesseract. This only works up to a point. You could also try the standard gambit of morphological operations on the image.

Depending on your overall requirements (will the digits always be in this font/size, will the backgrounds be noisy etc...) You might want to manually make each digit a separate image to make sure that Tesseract can handle the fonttype you are using. If it cannot work on single digits it is unlikely to work on anything else you pass it.




回答3:


The image resolution is way too low. Simply rescaling to 300 DPI has produced the correct result for me.



来源:https://stackoverflow.com/questions/19951598/tesseract-has-trouble-reading-this-extremely-simple-string-of-numbers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!