I\'m running a large number of OCRs on screenshots with Pytesseract. This is working well in most cases, but a small number is causing this error:
pytesserac
Use Unidecode
from unidecode import unidecode
import pytesseract
strs = pytesseract.image_to_string(Image.open('binarized_image.png'))
strs = unidecode(strs)
print (strs)
make sure you are using the right decoding options.
see https://docs.python.org/3/library/codecs.html#standard-encodings
str.decode('utf-8')
bytes.decode('cp950') for Traditional Chinese, etc