PyTesseract OCR unable to read digits from a simple image

梦想与她 提交于 2020-01-11 10:57:33

问题


I'm trying to get PyTesseract OCR to read digits from this simple and well cropped Image, but for some reason it's just not able to do this.

from PIL import Image
import pytesseract as p

def obtain_balance(a):
    im = Image.open(a)
    width,height = im.size
    a = 300*5 - 120
    # print(width,height)
    left = 155+a
    top = 5
    right = 360+a 
    bottom = 120
    m1 = im.crop((left, top, right, bottom)) 
    text = p.image_to_string(m1,lang='eng',config='--psm 13 --oem 3 -c tessedit_char_whitelist=0123456789').split()
    print(text)
    m1.show()
    return text

obtain_balance('cur.jpg')

Output :

[]

回答1:


When performing OCR, it is important to prepossess the image so that the desired foreground text is in black with the background in white. To do this, we can use OpenCV to Otsu's threshold the image and obtain a binary image. We then apply a slight Gaussian blur to smooth the image before throwing it into Pytesseract. We use --psm 6 config to treat the image as a single uniform block of text. See here for more configuration options.


Here's the preprocessed image and the result from Pytesseract

PRACTICE ACCOUNT
$9,047.26~ i

Code

import cv2
import pytesseract

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"

image = cv2.imread('1.png', 0)
thresh = cv2.threshold(image, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
thresh = cv2.GaussianBlur(thresh, (3,3), 0)
data = pytesseract.image_to_string(thresh, lang='eng',config='--psm 6')
print(data)

cv2.imshow('thresh', thresh)
cv2.waitKey()


来源:https://stackoverflow.com/questions/59237973/pytesseract-ocr-unable-to-read-digits-from-a-simple-image

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!