I\'m trying to write an application to find the numbers inside an image and add them up.
How can I identify the written number in an image?
Here's a simple approach:
Obtain binary image. Load the image, convert to grayscale, then Otsu's threshold to get a 1-channel binary image with pixels ranging from [0...255]
.
Detect horizontal and vertical lines. Create horizontal and vertical structuring elements then draw lines onto a mask by performing morphological operations.
Remove horizontal and vertical lines. Combine horizontal and vertical masks using a bitwise_or operation then remove the lines using a bitwise_and operation.
Perform OCR. Apply a slight Gaussian blur then OCR using Pytesseract.
Here's a visualization of each step:
Input image ->
Binary image ->
Horizontal mask ->
Vertical mask
Combined masks ->
Result ->
Applied slight blur
Result from OCR
38
18
78
I implemented it with Python but you can adapt a similar approach using Java
import cv2
import pytesseract
pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
# Load image, grayscale, Otsu's threshold
image = cv2.imread('1.png')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
# Detect horizontal lines
horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
# Detect vertical lines
vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,25))
vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=1)
# Remove horizontal and vertical lines
lines = cv2.bitwise_or(horizontal, vertical)
result = cv2.bitwise_not(image, image, mask=lines)
# Perform OCR with Pytesseract
result = cv2.GaussianBlur(result, (3,3), 0)
data = pytesseract.image_to_string(result, lang='eng', config='--psm 6')
print(data)
# Display
cv2.imshow('thresh', thresh)
cv2.imshow('horizontal', horizontal)
cv2.imshow('vertical', vertical)
cv2.imshow('lines', lines)
cv2.imshow('result', result)
cv2.waitKey()