Recognize a number from an image

前端 未结 6 1993
-上瘾入骨i
-上瘾入骨i 2021-01-31 08:53

I\'m trying to write an application to find the numbers inside an image and add them up.

How can I identify the written number in an image?

6条回答
  •  北海茫月
    2021-01-31 09:11

    Here's a simple approach:

    1. Obtain binary image. Load the image, convert to grayscale, then Otsu's threshold to get a 1-channel binary image with pixels ranging from [0...255].

    2. Detect horizontal and vertical lines. Create horizontal and vertical structuring elements then draw lines onto a mask by performing morphological operations.

    3. Remove horizontal and vertical lines. Combine horizontal and vertical masks using a bitwise_or operation then remove the lines using a bitwise_and operation.

    4. Perform OCR. Apply a slight Gaussian blur then OCR using Pytesseract.


    Here's a visualization of each step:

    Input image -> Binary image -> Horizontal mask -> Vertical mask

    Combined masks -> Result -> Applied slight blur

    Result from OCR

    38
    18
    78
    

    I implemented it with Python but you can adapt a similar approach using Java

    import cv2
    import pytesseract
    
    pytesseract.pytesseract.tesseract_cmd = r"C:\Program Files\Tesseract-OCR\tesseract.exe"
    
    # Load image, grayscale, Otsu's threshold
    image = cv2.imread('1.png')
    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
    thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
    
    # Detect horizontal lines
    horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (25,1))
    horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, horizontal_kernel, iterations=1)
    
    # Detect vertical lines
    vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,25))
    vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel, iterations=1)
    
    # Remove horizontal and vertical lines
    lines = cv2.bitwise_or(horizontal, vertical)
    result = cv2.bitwise_not(image, image, mask=lines)
    
    # Perform OCR with Pytesseract
    result = cv2.GaussianBlur(result, (3,3), 0)
    data = pytesseract.image_to_string(result, lang='eng', config='--psm 6')
    print(data)
    
    # Display
    cv2.imshow('thresh', thresh)
    cv2.imshow('horizontal', horizontal)
    cv2.imshow('vertical', vertical)
    cv2.imshow('lines', lines)
    cv2.imshow('result', result)
    cv2.waitKey()
    

提交回复
热议问题