Recognize numbers from an image python

假如想象 提交于 2021-02-07 10:23:24

问题


I am trying to extract numbers from in game screenshots.

Text

I'm trying to extract:

98
3430
5/10

from PIL import Image
import pytesseract 
image="D:/img/New folder (2)/1.png"
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
text = pytesseract.image_to_string(Image.open(image),lang='eng',config='--psm 5')
print(text)

output is gibberish

‘t hl) keteeeees
ek pSlaerenen
JU) pgrenmnreserenny
Rates B
d dali eas. 5
cle aM (Sores
|, S| pgranmrerererecons
a cee 3
pea 3
oS :
(geo eenee
ey
=
es A

回答1:


okay, so I tried changing it into grayscale, reverse contrast or use different treshold, but it all seems to be fairly inaccurate. The issue seems to be the tilted and smaller numbers. You do not happen to have any hiher res image? Most accurate I could get was the following code.

import cv2
import pytesseract
import imutils

pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img = cv2.imread('D:/img/New folder (2)/1.png') #test.png is your original image
img = imutils.resize(img, width=1400)
crop = img[340:530, 100:400]

data = pytesseract.image_to_string(crop,config=' --psm 1 --oem 3  -c tessedit_char_whitelist=0123456789/')
print(data)

cv2.imshow('crop', crop)
cv2.waitKey()

Otherwise I recommend one of these methods as described in the similar question or in this one.




回答2:


if the text is surrounded with the designs, tesseract suffers a lot

insted of tesseract try using findcontours in opencv (after little blurring, dilating)

you will get bounding boxes, then it might cover that text also



来源:https://stackoverflow.com/questions/60586672/recognize-numbers-from-an-image-python

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!