问题
I am trying to extract numbers from in game screenshots.
I'm trying to extract:
98
3430
5/10
from PIL import Image
import pytesseract
image="D:/img/New folder (2)/1.png"
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files/Tesseract-OCR/tesseract.exe'
text = pytesseract.image_to_string(Image.open(image),lang='eng',config='--psm 5')
print(text)
output is gibberish
‘t hl) keteeeees
ek pSlaerenen
JU) pgrenmnreserenny
Rates B
d dali eas. 5
cle aM (Sores
|, S| pgranmrerererecons
a cee 3
pea 3
oS :
(geo eenee
ey
=
es A
回答1:
okay, so I tried changing it into grayscale, reverse contrast or use different treshold, but it all seems to be fairly inaccurate. The issue seems to be the tilted and smaller numbers. You do not happen to have any hiher res image? Most accurate I could get was the following code.
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
img = cv2.imread('D:/img/New folder (2)/1.png') #test.png is your original image
img = imutils.resize(img, width=1400)
crop = img[340:530, 100:400]
data = pytesseract.image_to_string(crop,config=' --psm 1 --oem 3 -c tessedit_char_whitelist=0123456789/')
print(data)
cv2.imshow('crop', crop)
cv2.waitKey()
Otherwise I recommend one of these methods as described in the similar question or in this one.
回答2:
if the text is surrounded with the designs, tesseract suffers a lot
insted of tesseract try using findcontours in opencv (after little blurring, dilating)
you will get bounding boxes, then it might cover that text also
来源:https://stackoverflow.com/questions/60586672/recognize-numbers-from-an-image-python