How can I run tesseract with multiple languages one time?

前端 未结 2 505
隐瞒了意图╮
隐瞒了意图╮ 2021-02-12 15:48

I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if

相关标签:
2条回答
  • 2021-02-12 16:28

    Since tesseract 3.02 it is possible to specify multiple languages for the -l parameter.

    -l lang The language to use. If none is specified, English is assumed. Multiple languages may be specified, separated by plus characters. Tesseract uses 3-character ISO 639-2 language codes.

    An example:

    tesseract myscan.png out -l deu+eng
    
    0 讨论(0)
  • 2021-02-12 16:44

    Try this:

    custom_config = r'-l eng+jpn --psm 6'
    txt = pytesseract.image_to_string(img, config=custom_config)
    
    from langdetect import detect_langs
    detect_langs(txt)
    

    Note: you have to install langdetect by using:

     pip install langdetect
    
    0 讨论(0)
提交回复
热议问题