I am trying to use tesseract-2.04 in my iPhone application and just want to detect the numbers. What I am doing here is first I am cross compiling tesseract to generate lib file
I get quite good results setting
TessBaseAPI::SetVariable("tessedit_char_whitelist", "0123456789");
while gently urging the user to let the numbers fit in a certain box. This makes locating the numbers easier for me, and ensures the user keeps the image steady and at a reasonable distance leading to a sharper image.
I have thought about altering valid_word() in tesseract-2.04/dict/permute.cpp, but there seems to be no need for that.
The next step will be to hardcode a minimum/maximum char size so recognition time can become way less than the 500 ms it is now. Then the next step will be to add some code that keeps track of results in time, so that reading 5
90% of the time and 8
only 10% will lead the code to remember the 5
.
It all depends on the use case you have. I'm lucky in the sense that I'm allowed to just show a 200x50 box which will contain the number.