I have to analyzed a image which containing both English and Japanese texts. When I run tesseract by default (-l eng), some Japanese characters lost. Otherwise, if
-l eng
Try this:
custom_config = r'-l eng+jpn --psm 6' txt = pytesseract.image_to_string(img, config=custom_config) from langdetect import detect_langs detect_langs(txt)
Note: you have to install langdetect by using:
pip install langdetect