I am using tesseract
for OCR, via the pytesseract
bindings. Unfortunately, I encounter difficulties when trying to extract text including subscript-sty
This is because the font of subscript is too small. You could resize the image using a python package such as cv2
or PIL
and use the resized image for OCR as coded below.
import pytesseract
import cv2
img = cv2.imread('test.jpg')
img = cv2.resize(img, None, fx=2, fy=2) # scaling factor = 2
data = pytesseract.image_to_string(img)
print(data)
OUTPUT:
CH3