Make tesseract recognise numbers only

旧街凉风 提交于 2021-02-07 06:15:32

问题


I am trying to refine an OCR prog I made to read the layout of a certain image that I am using. Right now, I would like my OCR prog to recognise only digits 0-9.

I tried to follow the solution from the question:

Limit characters tesseract is looking for

But I got stuck in the part where I have to call tesseract as:

tesseract input.tif output nobatch letters  

where does this go?


回答1:


I posted some things about tesseract some time ago in SO: see Tesseract OCR Library - Learning Font. There is notably a link to tesseract training which will tell you how to restrain your set of characters and describe your ambiguities.




回答2:


i had the same issue using python, wit tesseract 3 Assuming further readers may do so.

from here : https://github.com/tesseract-ocr/tesseract/wiki/FAQ#how-do-i-recognize-only-digits

and here: https://github.com/madmaze/pytesseract/blob/27fed535bf1eb665ec991313841b177336b50f61/src/pytesseract.py#L91

i succeeded using :

pytesseract.image_to_string(someimage, config='outputbase digits')




回答3:


This question is answered on Tesseract FAQ

And here is how you can get tesseract to recognise numbers only:

Tesseract 2 - BEFORE calling an Init function or put this in a text file called tessdata/configs/digits:

tessedit_char_whitelist 0123456789

and then your command line becomes:

tesseract image.tif outputbase nobatch digits

Tesseract 3 - A digits config file is already created, so just run a tesseract command like this:

tesseract imagename outputbase digits



回答4:


It is the command you use to tesseract run on command line.

For a better answer, we need to know if you are running tesseract on command line or as a library.



来源:https://stackoverflow.com/questions/11304286/make-tesseract-recognise-numbers-only

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!