问题
I have the following code (which is in fact just 1 part of 4 needed to run all the project I am working on..):
#python classify.py --model models/svm.cpickle --image images/image.png
from __future__ import print_function
from sklearn.externals import joblib
from hog import HOG
import dataset
import argparse
import mahotas
import cv2
ap = argparse.ArgumentParser()
ap.add_argument("-m", "--model", required = True,
help = "path to where the model will be stored")
ap.add_argument("-i", "--image", required = True,
help = "path to the image file")
args = vars(ap.parse_args())
model = joblib.load(args["model"])
hog = HOG(orientations = 18, pixelsPerCell = (10, 10),
cellsPerBlock = (1, 1), transform = True)
image = cv2.imread(args["image"])
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
blurred = cv2.GaussianBlur(gray, (5, 5), 0)
edged = cv2.Canny(blurred, 30, 150)
(_, cnts, _) = cv2.findContours(edged.copy(), cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_SIMPLE)
cnts = sorted([(c, cv2.boundingRect(c)[0]) for c in cnts], key =
lambda x: x[1])
for (c, _) in cnts:
(x, y, w, h) = cv2.boundingRect(c)
if w >= 7 and h >= 20:
roi = gray[y:y + h, x:x + w]
thresh = roi.copy()
T = mahotas.thresholding.otsu(roi)
thresh[thresh > T] = 255
thresh = cv2.bitwise_not(thresh)
thresh = dataset.deskew(thresh, 20)
thresh = dataset.center_extent(thresh, (20, 20))
cv2.imshow("thresh", thresh)
hist = hog.describe(thresh)
digit = model.predict([hist])[0]
print("I think that number is: {}".format(digit))
cv2.rectangle(image, (x, y), (x + w, y + h),
(0, 255, 0), 1)
cv2.putText(image, str(digit), (x - 10, y - 10),
cv2.FONT_HERSHEY_SIMPLEX, 1.2, (0, 255, 0), 2)
cv2.imshow("image", image)
cv2.waitKey(0)
This code is detecting and recognizing handwriten digits from images. Here is an example:
Let's say I don't care about the accuracy recognition.
My problem is the following: as you can see, the program take all the numbers he can see and print them in console. From console I can save them in a text file if I want BUT I can't tell the program that there is a space between the numbers.
What I want is that, if I print the numbers in a text file, they should be separated as in the image (sorry but it's a bit hard to explain..). The numbers should not be (even in console) printed all together but, where there is blank space, printed a blank area also.
Take a look at the firs image. After the first 10 digits, there is a blank space in image which there isn't in console.
Anyway, here is a link to full code. There are 4 .py
files and 3 folders. To execute, open a CMD in the folder and paste the command python classify.py --model models/svm.cpickle --image images/image.png
where image.png
is the name of one file in images folder.
Full Code
Thanks in advance. In my opinion all this work would have to be done using neural networks but I want to try it first this way. I'm pretty new to this.
回答1:
This is a starter solution.
I don't have anything in Python for the time being but it shouldn't be hard to convert this plus the OpenCV function calls are similar and I've linked them below.
TLDR;
Find the centre of your boundingRects, then find the distance between them. If one rect is a certain threshold away, you may assume it as being a space.
First, find the centres of your bounding rectangles
vector<Point2f> centres;
for(size_t index = 0; index < contours.size(); ++index)
{
Moments moment = moments(contours[index]);
centres.push_back(Point2f(static_cast<float>(moment.m10/moment.m00), static_cast<float>(moment.m01/moment.m00)));
}
(Optional but recommended)
You can draw the centres to have a visual understanding of them.
for(size_t index = 0; index < centres.size(); ++index)
{
Scalar colour = Scalar(255, 255, 0);
circle(frame, circles[index], 2, colour, 2);
}
With this, just iterate through them confirming that the distance to the next one is within a reasonable threshold
for(size_t index = 0; index < centres.size(); ++index)
{
// this is just a sample value. Tweak it around to see which value actually makes sense
double distance = 0.5;
Point2f current = centres[index];
Point2f nextPoint = centres[index + 1];
// norm calculates the euclidean distance between two points
if(norm(nextPoint - current) >= distance)
{
// TODO: This is a potential space??
}
}
You can read more about moments, norm and circle drawing calls in Python.
Happy coding, Cheers mate :)
回答2:
Used this code to do the job. It detects region of text/digits in images.
import cv2
image = cv2.imread("C:\\Users\\Bob\\Desktop\\PyHw\\images\\test5.png")
gray = cv2.cvtColor(image,cv2.COLOR_BGR2GRAY) # grayscale
_,thresh = cv2.threshold(gray,150,255,cv2.THRESH_BINARY_INV) # threshold
kernel = cv2.getStructuringElement(cv2.MORPH_CROSS,(3,3))
dilated = cv2.dilate(thresh,kernel,iterations = 13) # dilate
_, contours, hierarchy = cv2.findContours(dilated,cv2.RETR_EXTERNAL,cv2.CHAIN_APPROX_NONE) # get contours
idx =0
# for each contour found, draw a rectangle around it on original image
for contour in contours:
idx += 1
# get rectangle bounding contour
[x,y,w,h] = cv2.boundingRect(contour)
# discard areas that are too large
if h>300 and w>300:
continue
# discard areas that are too small
if h<40 or w<40:
continue
# draw rectangle around contour on original image
#cv2.rectangle(image,(x,y),(x+w,y+h),(255,0,255),2)
roi = image[y:y + h, x:x + w]
cv2.imwrite('C:\\Users\\Bob\\Desktop\\' + str(idx) + '.jpg', roi)
cv2.imshow('img',roi)
cv2.waitKey(0)
The code is based on this other question/answer: Extracting text OpenCV
来源:https://stackoverflow.com/questions/46001090/detect-space-between-text-opencv-python