Currently, I am working on an OCR project where I need to read the text off of a label (see example images below). I am running into issues with the image skew and I need he
ASSUMPTIONS:
SOLUTION:
hgt_rot_angle = cv2.minAreaRect(your_CLEAN_image_pixel_coordinates_to_enclose)[-1]
com_rot_angle = hgt_rot_angle + 90 if hgt_rot_angle < -45 else hgt_rot_angle
(h, w) = my_input_image.shape[0:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, com_rot_angle, 1.0)
corrected_image = cv2.warpAffine(your_ORIGINAL_image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
ORIGINAL SOURCE:
https://www.pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/ - a GREAT tutorial to get started (kudos to Adrian Rosebrock), BUT:
cv2.minAreaRect()
is not quite clear there and the code has the same variable for detection and for correction, which is even more confusing. I used the separate variables for clarity and my explanation of the first two lines of code is below.cv2.getRotationMatrix2D()
function, based on OpenCV documentation and based on my testing. More on this below as well.SOLUTION EXPLANATION:
The cv2.minAreaRect()
function returns the rotation angle value in the [-90, 0]
range as the last element of the tuple returned, and the angle value is tied to the HEIGHT value in the same returned tuple (it's located at cv2.minAreaRect()[1][1]
, to be precise, but we're not using it here).
Unless the angle of rotation is either -90.0
or 0.0
, the decision of what dimension is chosen as the "height" is not arbitrary - it always has to go from upper left to lower right, i.e. to have a negative slope.
What this means for our use case is that, depending on the width-height proportion of the content block and on its tilt, the "height" value returned by cv2.minAreaRect()
can be either the content block's logical height OR the width.
This means 2 things for us:
So, given (1) no assumptions about the content block's aspect ratio and (2) the assumed [-45:45]
range of the tilt, we can get the common tilt of the height and the width relative to the rectangular coordinate system (in the [-45:45]
range) by simply adding 90 degrees to the rotation value of the "height" if it falls below -45.0
.
Once we get this detected and calculated "common rotation angle" value, we can use it to fix the tilt by just passing the value directly to the cv2.getRotationMatrix2D()
function.
NOTE: the calculated existing "common rotation angle" is negative for the counter-clockwise tilt and positive for the clockwise tilt, which is a very common everyday convention. However, if we think of the angle
argument of cv2.getRotationMatrix2D()
as "the correction angle to apply" (which, I think, was the intent), then the sign convenion is the OPPOSITE. So we need to pass the detected and calculated "common rotation angle" value as-is if we want to see it counter-acted in the output image, which is supported by the many tests that I have performed.
This is a direct quote on the angle
parameter from OpenCV documentation:
Rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner).
WHAT IF THE SINGLE RECTANGLE IS A POOR FIT?
The above solution works very well for densely populated full page scans, clean labels and things like that, but it does not work well at all for sparsely populated images, where the overall tightest fit is not a rectangle, i.e. when the 2nd starting assumption does not hold.
In the latter scenario the following may work IF most of the individual shapes in the input image can nicely fit into rectangles, or at least better than all of the content combined:
OTHER SOURCES:
https://www.pyimagesearch.com/2015/11/30/detecting-machine-readable-zones-in-passport-images/
https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html
Here's an implementation of the Projection Profile Method to determine skew. After obtaining a binary image, the idea is rotate the image at various angles and generate a histogram of pixels in each iteration. To determine the skew angle, we compare the maximum difference between peaks and using this skew angle, rotate the image to correct the skew
Left (original), Right (corrected)
import cv2
import numpy as np
from scipy.ndimage import interpolation as inter
def correct_skew(image, delta=1, limit=5):
def determine_score(arr, angle):
data = inter.rotate(arr, angle, reshape=False, order=0)
histogram = np.sum(data, axis=1)
score = np.sum((histogram[1:] - histogram[:-1]) ** 2)
return histogram, score
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]
scores = []
angles = np.arange(-limit, limit + delta, delta)
for angle in angles:
histogram, score = determine_score(thresh, angle)
scores.append(score)
best_angle = angles[scores.index(max(scores))]
(h, w) = image.shape[:2]
center = (w // 2, h // 2)
M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
borderMode=cv2.BORDER_REPLICATE)
return best_angle, rotated
if __name__ == '__main__':
image = cv2.imread('1.png')
angle, rotated = correct_skew(image)
print(angle)
cv2.imshow('rotated', rotated)
cv2.imwrite('rotated.png', rotated)
cv2.waitKey()