Tesseract training for a new font

后端未结

关注

 3  1956

I\'m still new to Tesseract OCR and after using it in my script noticed it had a relatively big error rate for the images I was trying to extract text from. I came across Tesser

相关标签:

3条回答

广开言路

2021-01-31 10:45

For anyone that is still going to read this, you can use this tool to get a traineddata file of whichever font you want. After that move the traineddata file in your tessdata folder. To use tesseract with the new font in Python or any other language (I think?) put lang = "Font"as second parameter in image_to_string function. It improves accuracy significantly but can still make mistakes ofcourse. Or you can just learn how to train tesseract for a new font manually with this guide: http://pretius.com/how-to-prepare-training-files-for-tesseract-ocr-and-improve-characters-recognition/.

0 讨论(0)
发布评论:

提交评论
- 加载中...
别跟我提以往

2021-01-31 10:48

I made a video tutorial explaining the process for the latest version of Tesseract (The LSTM model), hope it helps. https://www.youtube.com/watch?v=TpD76k2HYms

0 讨论(0)
发布评论:

提交评论
- 加载中...
既然无缘

2021-01-31 11:05

If you want to train tesseract with the new font, then generate .traineddata file with your desired font. For generating .traineddata, first you will need .tiff file and .box file. You can create these files using jTessBoxEditor. Tutorial for jBossTextEditor is here. While making .tiff file you can set the font in which you have train tesseract. Either you can jTessBoxEditor for generating .traineddata or serak-tesseract-trainer is also there. I have used both and I would say that for generating tiff and box files jTessBoxEditor is great and for training tesseract use serak.

0 讨论(0)
发布评论:

提交评论
- 加载中...