Above is the image ,I have tried everything I could get from SO or google ,nothing seems to work. I can not get the exact value in image , I should get 2.10 , Inste
I was able to increase the number of correct decimals by using the methods mentioned in the other answers. Yet, a small share of the decimals were not recognized correctly.
The solution I found was to change the language setting for pytesseract.
I was using a non-English setting, but changing the config to lang='eng'
fixed all remaining issues.
Not sure what the reason is, but with the new LSTM engine for Tesseract, the training data is probably mostly English.
Sometimes tesseract is oddly sensitive to image size. You can often get better results by scaling your image.
I scaled your image by a factor of 2 and I got good results.
import cv2
import pytesseract
# if windows
# pytesseract.pytesseract.tesseract_cmd = 'C:\\Program Files (x86)\\Tesseract-OCR\\tesseract.exe'
img = cv2.imread('twoten.png', 0)
img = cv2.resize(img, (0,0), fx=2, fy=2)
config = ("--psm 12")
data = pytesseract.image_to_string(img, lang='eng', config = config)
print(data)
which gave this in a console:
$2.10
Before throwing the image into Pytesseract, some preprocessing to clean/smooth the image helps. Here's a simple approach
First we convert the image to grayscale, resize using the imutils library then threshold to obtain a binary image
Now we perform morphological transformations to smooth the image
Now we invert the image for Pytesseract and add a Gaussian blur
We use the --psm 10
config flag since we want to treat the image as a single character. Here's some additional configuration flags that could be useful
Results
$2.10
After filtering
2.10
import cv2
import pytesseract
import imutils
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
image = cv2.imread('1.png',0)
image = imutils.resize(image, width=300)
thresh = cv2.threshold(image, 150, 255, cv2.THRESH_BINARY_INV)[1]
kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3,3))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel)
result = 255 - close
result = cv2.GaussianBlur(result, (5,5), 0)
data = pytesseract.image_to_string(result, lang='eng',config='--psm 10 ')
processed_data = ''.join(char for char in data if char.isnumeric() or char == '.')
print(data)
print(processed_data)
cv2.imshow('thresh', thresh)
cv2.imshow('close', close)
cv2.imshow('result', result)
cv2.waitKey()