How to Segment handwritten and printed digit without losing information in opencv?

问题

I've written an algorithm that would detect printed and handwritten digit and segment it but while removing outer rectangle handwritten digit is lost using clear_border from ski-image package. Any suggestion to prevent information.

Sample:

How to get all 5 characters separately?

回答1:

Segmenting characters from the image -

Approach -

Threshold the image (Convert it to BW)
Perform Dilation
Check the contours are large enough
Find rectangular Contours
Take ROI and save the characters

Python Code -

# import the necessary packages
import numpy as np
import cv2
import imutils

# load the image, convert it to grayscale, and blur it to remove noise
image = cv2.imread("sample1.jpg")
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (7, 7), 0)

# threshold the image
ret,thresh1 = cv2.threshold(gray ,127,255,cv2.THRESH_BINARY_INV)

# dilate the white portions
dilate = cv2.dilate(thresh1, None, iterations=2)

# find contours in the image
cnts = cv2.findContours(dilate.copy(), cv2.RETR_EXTERNAL,
    cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if imutils.is_cv2() else cnts[1]

orig = image.copy()
i = 0

for cnt in cnts:
    # Check the area of contour, if it is very small ignore it
    if(cv2.contourArea(cnt) < 100):
        continue

    # Filtered countours are detected
    x,y,w,h = cv2.boundingRect(cnt)

    # Taking ROI of the cotour
    roi = image[y:y+h, x:x+w]

    # Mark them on the image if you want
    cv2.rectangle(orig,(x,y),(x+w,y+h),(0,255,0),2)

    # Save your contours or characters
    cv2.imwrite("roi" + str(i) + ".png", roi)

    i = i + 1 

cv2.imshow("Image", orig) 
cv2.waitKey(0)

First of all I thresholded the image to convert it to black n white. I get characters in white portion of image and background as black. Then I Dilated the image to make the characters (white portions) thick, this will make it easy to find the appropriate contours. Then find findContours method is used to find the contours. Then we need to check that the contour is large enough, if the contour is not large enough then it is ignored ( because that contour is noise ). Then boundingRect method is used to find the rectangle for the contour. And finally, the detected contours are saved and drawn.

Input Image -

Threshold -

Dilated -

Contours -

Saved characters -

回答2:

Problem of eroded/cropped handwritten digits: you may solve this problem in the recognition step, or even in image improvement step (before recognition).

if only a very small part of digit is cropped (such your image example), it's enough to pad your image around by 1 or 2 pixels to make the segmentation process easy. Or some morpho filter (dilate) can improve your digit even after padding. (these solution are available in Opencv)
if a enough good part of digit is cropped, you need to add a degraded/cropped pattern of digits to the training Dataset used for digit recognition algorithm, (i.e. digit 3 with all possible cropping cases.. etc)

Problem of characters separation :

opencv offers blob detection algorithm that works well on your issue (choose the correct value for concave & convexity params)
opencv offers as well contour detector (canny() function), that helps to detect the contours of your character then you can find the fitted bounding (offered by Opencv as well : cv2.approxPolyDP(contour,..,..)) box around each character

来源：https://stackoverflow.com/questions/52995607/how-to-segment-handwritten-and-printed-digit-without-losing-information-in-openc

标签

python-3.x

OpenCV

image-processing

computer-vision

digits