Pytesseract: UnicodeDecodeError: 'charmap' codec can't decode byte

后端 未结 2 1466
一整个雨季
一整个雨季 2020-12-21 02:25

I\'m running a large number of OCRs on screenshots with Pytesseract. This is working well in most cases, but a small number is causing this error:

pytesserac         


        
相关标签:
2条回答
  • 2020-12-21 02:43

    Use Unidecode

    from unidecode import unidecode
    import pytesseract
    
    strs = pytesseract.image_to_string(Image.open('binarized_image.png'))
    strs = unidecode(strs)
    print (strs)
    
    0 讨论(0)
  • 2020-12-21 02:49

    make sure you are using the right decoding options.
    see https://docs.python.org/3/library/codecs.html#standard-encodings

    str.decode('utf-8')
    bytes.decode('cp950') for Traditional Chinese, etc

    0 讨论(0)
提交回复
热议问题