Tesseract does not recognize single characters

后端未结

关注

 4  585

夕颜

How to represent:

Create new image with paint (any size)
Add letter A to this image
Try to recognize -> tesseract will not find any letters

相关标签:

4条回答

悲哀的现实

2020-12-06 10:38
Have you seen this?

https://code.google.com/p/tesseract-ocr/issues/detail?id=581

The bug list shows it as "no longer an issue".
- Be sure to have high resolution images.
- If you are resizing the image, be sure to keep a high DPI and don't resize too small
- Be sure to train your tesseract system
- use the baseApi.setVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); code before the init Tesseract
- Also, you may look into which font to use with OCR
0 讨论(0)
发布评论:

提交评论
- 加载中...
粉色の甜心

2020-12-06 10:41
You must set the "page segmentation mode" to "single char".

For example, in Android you do the following:
```
api.setPageSegMode(TessBaseAPI.pageSegMode.PSM_SINGLE_CHAR);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-06 10:52
python code to do that configuration is like this:
```
import pytesseract
import cv2
img = cv2.imread("path to some image")
pytesseract.image_to_string(
     img, config=("-c tessedit"
                  "_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
                  " --psm 10"
                  " -l osd"
                  " "))
```
the --psm flag defines the page segmentation mode.

according to documentaion of tesseract, 10 means :

Treat the image as a single character.

so to recognize a single character you just need to use : --psm 10 flag.
0 讨论(0)
发布评论:

提交评论
- 加载中...
无人共我

2020-12-06 10:53

You need to set Tesseract's page segmentation mode to "single character."

0 讨论(0)
发布评论:

提交评论
- 加载中...