Remove remains in a letter image with Python

问题

I have a set of images that represent letters extracted from an image of a word. In some images there are remains of the adjacent letters and I want to eliminate them but I do not know how.

Some samples

I'm working with openCV and I've tried two ways and none works.

With findContours:

def is_contour_bad(c):
    return len(c) < 50

gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edged = cv2.Canny(gray, 50, 100)

contours = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]

mask = np.ones(image.shape[:2], dtype="uint8") * 255

for c in contours:
    # if the c  ontour is bad, draw it on the mask
    if is_contour_bad(c):
        cv2.drawContours(mask, [c], -1, 0, -1)

# remove the contours from the image and show the resulting images
image = cv2.bitwise_and(image, image, mask=mask)
cv2.imshow("After", image)
cv2.waitKey(0)

I think it does not work because the image is on the edge cv2.drawContours can not calculate the area correctly and does not eliminate the interior points

With connectedComponentsWithStats:

cv2.imshow("Image", img)
cv2.waitKey(0)
nb_components, output, stats, centroids = cv2.connectedComponentsWithStats(img)
sizes = stats[1:, -1];
nb_components = nb_components - 1

min_size = 150

img2 = np.zeros((output.shape))
for i in range(0, nb_components):
    if sizes[i] >= min_size:
        img2[output == i + 1] = 255

cv2.imshow("After", img2)
cv2.waitKey(0)

In this case I do not know why the small elements on the sides do not recognize them as connected components

Well..I would greatly appreciate any help!

回答1:

In the very beginning of the question you have mentioned that letters have been extracted from an image of a word.

So as I think, You could have done the extraction correctly. Then you wouldn't have faced a problem like this. I can give you a solution which is applicable to either extracting letters from original image or extract and separate letters from the image you have given.

Solution:

You can use convex hull coordinates to separate characters like this.

code:

import cv2
import numpy as np

img = cv2.imread('test.png', 0)
cv2.bitwise_not(img,img)
img2 = img.copy()

ret, threshed_img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
image, contours, hier = cv2.findContours(threshed_img, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)

#--- Black image to be used to draw individual convex hull ---
black = np.zeros_like(img)
contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])

for cnt in contours:
    hull = cv2.convexHull(cnt)

    img3 = img.copy()
    black2 = black.copy()

    #--- Here is where I am filling the contour after finding the convex hull ---
    cv2.drawContours(black2, [hull], -1, (255, 255, 255), -1)
    r, t2 = cv2.threshold(black2, 127, 255, cv2.THRESH_BINARY)
    masked = cv2.bitwise_and(img2, img2, mask = t2)
    cv2.imshow("masked.jpg", masked)
    cv2.waitKey(0)

cv2.destroyAllWindows()

outputs:

So as I suggest, the better thing is to use this solution when you extract characters from original image rather than removing noises after extraction.

回答2:

I would try the following:

Sum along the columns so that every image gets projected into a vector
Assuming that white=0 and black=1, find the first index value in that vector that = 0.
Remove the image columns to the left of the index value from step 2.
Reverse the summed vector from step 1
Find the first index value that =0 in the reversed vector from step four.
Remove the image columns to the right of the reversed index value from step 5.

This would work nicely for a binary image where white = 0 and black = 1 but if not, there are several methods around this including image threshholding or setting tolerance levels (e.g. for step 2. find first index value in vector that > tolerance...)

来源：https://stackoverflow.com/questions/53504738/remove-remains-in-a-letter-image-with-python

标签

python

OpenCV

image-processing

cv2

outliers