Tesseract does not recognize single characters

后端 未结 4 583
夕颜
夕颜 2020-12-06 10:26

How to represent:

  1. Create new image with paint (any size)
  2. Add letter A to this image
  3. Try to recognize -> tesseract will not find any letters
相关标签:
4条回答
  • 2020-12-06 10:38

    Have you seen this?

    https://code.google.com/p/tesseract-ocr/issues/detail?id=581

    The bug list shows it as "no longer an issue".

    • Be sure to have high resolution images.
    • If you are resizing the image, be sure to keep a high DPI and don't resize too small
    • Be sure to train your tesseract system
    • use the baseApi.setVariable("tessedit_char_whitelist", "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"); code before the init Tesseract
    • Also, you may look into which font to use with OCR
    0 讨论(0)
  • 2020-12-06 10:41

    You must set the "page segmentation mode" to "single char".

    For example, in Android you do the following:

    api.setPageSegMode(TessBaseAPI.pageSegMode.PSM_SINGLE_CHAR);
    
    0 讨论(0)
  • 2020-12-06 10:52

    python code to do that configuration is like this:

    import pytesseract
    import cv2
    img = cv2.imread("path to some image")
    pytesseract.image_to_string(
         img, config=("-c tessedit"
                      "_char_whitelist=abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"
                      " --psm 10"
                      " -l osd"
                      " "))
    

    the --psm flag defines the page segmentation mode.

    according to documentaion of tesseract, 10 means :

    Treat the image as a single character.

    so to recognize a single character you just need to use : --psm 10 flag.

    0 讨论(0)
  • 2020-12-06 10:53

    You need to set Tesseract's page segmentation mode to "single character."

    0 讨论(0)
提交回复
热议问题