I have a set of images that represent letters extracted from an image of a word. In some images there are remains of the adjacent letters and I want to eliminate them but I do not know how.
Some samples
I'm working with openCV and I've tried two ways and none works.
With findContours:
def is_contour_bad(c):
return len(c) < 50
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
edged = cv2.Canny(gray, 50, 100)
contours = cv2.findContours(edged.copy(), cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
contours = contours[0] if imutils.is_cv2() else contours[1]
mask = np.ones(image.shape[:2], dtype="uint8") * 255
for c in contours:
# if the c ontour is bad, draw it on the mask
if is_contour_bad(c):
cv2.drawContours(mask, [c], -1, 0, -1)
# remove the contours from the image and show the resulting images
image = cv2.bitwise_and(image, image, mask=mask)
cv2.imshow("After", image)
I think it does not work because the image is on the edge cv2.drawContours can not calculate the area correctly and does not eliminate the interior points
With connectedComponentsWithStats:
cv2.imshow("Image", img)
nb_components, output, stats, centroids = cv2.connectedComponentsWithStats(img)
sizes = stats[1:, -1];
nb_components = nb_components - 1
min_size = 150
img2 = np.zeros((output.shape))
for i in range(0, nb_components):
if sizes[i] >= min_size:
img2[output == i + 1] = 255
cv2.imshow("After", img2)
In this case I do not know why the small elements on the sides do not recognize them as connected components
Well..I would greatly appreciate any help!
In the very beginning of the question you have mentioned that letters have been extracted from an image of a word.
So as I think, You could have done the extraction correctly. Then you wouldn't have faced a problem like this. I can give you a solution which is applicable to either extracting letters from original image or extract and separate letters from the image you have given.
You can use convex hull
coordinates to separate characters like this.
import cv2
import numpy as np
img = cv2.imread('test.png', 0)
img2 = img.copy()
ret, threshed_img = cv2.threshold(img, 127, 255, cv2.THRESH_BINARY)
image, contours, hier = cv2.findContours(threshed_img, cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE)
#--- Black image to be used to draw individual convex hull ---
black = np.zeros_like(img)
contours = sorted(contours, key=lambda ctr: cv2.boundingRect(ctr)[0])
for cnt in contours:
hull = cv2.convexHull(cnt)
img3 = img.copy()
black2 = black.copy()
#--- Here is where I am filling the contour after finding the convex hull ---
cv2.drawContours(black2, [hull], -1, (255, 255, 255), -1)
r, t2 = cv2.threshold(black2, 127, 255, cv2.THRESH_BINARY)
masked = cv2.bitwise_and(img2, img2, mask = t2)
cv2.imshow("masked.jpg", masked)
So as I suggest, the better thing is to use this solution when you extract characters from original image rather than removing noises after extraction.
I would try the following:
- Sum along the columns so that every image gets projected into a vector
- Assuming that white=0 and black=1, find the first index value in that vector that = 0.
- Remove the image columns to the left of the index value from step 2.
- Reverse the summed vector from step 1
- Find the first index value that =0 in the reversed vector from step four.
- Remove the image columns to the right of the reversed index value from step 5.
This would work nicely for a binary image where white = 0 and black = 1 but if not, there are several methods around this including image threshholding or setting tolerance levels (e.g. for step 2. find first index value in vector that > tolerance...)