How to detect subscript numbers in an image using OCR?

前端 未结 3 1503
抹茶落季
抹茶落季 2021-02-14 11:46

I am using tesseract for OCR, via the pytesseract bindings. Unfortunately, I encounter difficulties when trying to extract text including subscript-sty

3条回答
  •  误落风尘
    2021-02-14 12:45

    This is because the font of subscript is too small. You could resize the image using a python package such as cv2 or PIL and use the resized image for OCR as coded below.

    import pytesseract
    import cv2
    
    img = cv2.imread('test.jpg')
    img = cv2.resize(img, None, fx=2, fy=2)  # scaling factor = 2
    
    data = pytesseract.image_to_string(img)
    print(data)
    

    OUTPUT:

    CH3
    

提交回复
热议问题