Python OpenCV skew correction for OCR

后端 未结 2 1831
被撕碎了的回忆
被撕碎了的回忆 2020-11-27 07:59

Currently, I am working on an OCR project where I need to read the text off of a label (see example images below). I am running into issues with the image skew and I need he

相关标签:
2条回答
  • 2020-11-27 08:17

    ASSUMPTIONS:

    1. The content in your input image is not tilted by more than 45 degrees in either direction
    2. All of the content fits relatively well into one rectangular shape
    3. You've already applied the thresholding, and then possibly either erosion or clustering algorithms to get rid of the noise

    SOLUTION:

    hgt_rot_angle = cv2.minAreaRect(your_CLEAN_image_pixel_coordinates_to_enclose)[-1]
    com_rot_angle = hgt_rot_angle + 90 if hgt_rot_angle < -45 else hgt_rot_angle
    
    (h, w) = my_input_image.shape[0:2]
    center = (w // 2, h // 2)
    M = cv2.getRotationMatrix2D(center, com_rot_angle, 1.0)
    corrected_image = cv2.warpAffine(your_ORIGINAL_image, M, (w, h), flags=cv2.INTER_CUBIC, borderMode=cv2.BORDER_REPLICATE)
    

    ORIGINAL SOURCE:

    https://www.pyimagesearch.com/2017/02/20/text-skew-correction-opencv-python/ - a GREAT tutorial to get started (kudos to Adrian Rosebrock), BUT:

    • It operates on clean synthesized images of text and does not have the noise reduction steps in it or even references to them, only the thresholding... In most real-world scenarios, however, the images that need the rotation performed before OCR also need significant noise reduction performed. I have tried the OpenCV erosion operations and the scikit-learn DBSCAN clustering algorithm to pass only the "core" pixels to the above solution, and they both worked reasonably well.
    • I think that the explanation of how to interpret the angle value returned by cv2.minAreaRect() is not quite clear there and the code has the same variable for detection and for correction, which is even more confusing. I used the separate variables for clarity and my explanation of the first two lines of code is below.
    • I must respectfully disagree that we need to "take the inverse" of the detected angle of rotation (lines 38 and 43 in the tutorial) before passing the value to the cv2.getRotationMatrix2D() function, based on OpenCV documentation and based on my testing. More on this below as well.

    SOLUTION EXPLANATION:

    The cv2.minAreaRect() function returns the rotation angle value in the [-90, 0] range as the last element of the tuple returned, and the angle value is tied to the HEIGHT value in the same returned tuple (it's located at cv2.minAreaRect()[1][1], to be precise, but we're not using it here).

    Unless the angle of rotation is either -90.0 or 0.0, the decision of what dimension is chosen as the "height" is not arbitrary - it always has to go from upper left to lower right, i.e. to have a negative slope.

    What this means for our use case is that, depending on the width-height proportion of the content block and on its tilt, the "height" value returned by cv2.minAreaRect() can be either the content block's logical height OR the width.

    This means 2 things for us:

    1. We can't fix a tilt of over 45 degrees to either side without making assumptions about the "proper" aspect ratio.
    2. Without the assumptions about the content block's aspect ratio we HAVE TO MAKE THE ASSUMPTION that the content is tilted by less than 45 degrees to either side, just in order to proceed. This assumption works very well for the scans where only the portrait orientation was intended, but breaks for the documents with just one page out of many scanned using the lanscape orientation. I have not tackled this problem yet.

    So, given (1) no assumptions about the content block's aspect ratio and (2) the assumed [-45:45] range of the tilt, we can get the common tilt of the height and the width relative to the rectangular coordinate system (in the [-45:45] range) by simply adding 90 degrees to the rotation value of the "height" if it falls below -45.0.

    Once we get this detected and calculated "common rotation angle" value, we can use it to fix the tilt by just passing the value directly to the cv2.getRotationMatrix2D() function.
    NOTE: the calculated existing "common rotation angle" is negative for the counter-clockwise tilt and positive for the clockwise tilt, which is a very common everyday convention. However, if we think of the angle argument of cv2.getRotationMatrix2D() as "the correction angle to apply" (which, I think, was the intent), then the sign convenion is the OPPOSITE. So we need to pass the detected and calculated "common rotation angle" value as-is if we want to see it counter-acted in the output image, which is supported by the many tests that I have performed.
    This is a direct quote on the angle parameter from OpenCV documentation:

    Rotation angle in degrees. Positive values mean counter-clockwise rotation (the coordinate origin is assumed to be the top-left corner).

    WHAT IF THE SINGLE RECTANGLE IS A POOR FIT?

    The above solution works very well for densely populated full page scans, clean labels and things like that, but it does not work well at all for sparsely populated images, where the overall tightest fit is not a rectangle, i.e. when the 2nd starting assumption does not hold.

    In the latter scenario the following may work IF most of the individual shapes in the input image can nicely fit into rectangles, or at least better than all of the content combined:

    • Applying the thresholding / grading / morphing / erosion operations and, finally, the countouring in order to locate and to outline the areas of the image that are likely to contain relevant content and not noise.
    • Getting the MAR (min area rectangle) for each contour and the rotation angle for each corresponding MAR.
    • Aggregating the results to arrive at the most probable overall tilt angle that needs to be fixed (the exact methods here are many).

    OTHER SOURCES:

    https://www.pyimagesearch.com/2015/11/30/detecting-machine-readable-zones-in-passport-images/

    https://docs.opencv.org/master/dd/d49/tutorial_py_contour_features.html

    0 讨论(0)
  • 2020-11-27 08:19

    Here's an implementation of the Projection Profile Method to determine skew. After obtaining a binary image, the idea is rotate the image at various angles and generate a histogram of pixels in each iteration. To determine the skew angle, we compare the maximum difference between peaks and using this skew angle, rotate the image to correct the skew


    Left (original), Right (corrected)

    import cv2
    import numpy as np
    from scipy.ndimage import interpolation as inter
    
    def correct_skew(image, delta=1, limit=5):
        def determine_score(arr, angle):
            data = inter.rotate(arr, angle, reshape=False, order=0)
            histogram = np.sum(data, axis=1)
            score = np.sum((histogram[1:] - histogram[:-1]) ** 2)
            return histogram, score
    
        gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
        thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1] 
    
        scores = []
        angles = np.arange(-limit, limit + delta, delta)
        for angle in angles:
            histogram, score = determine_score(thresh, angle)
            scores.append(score)
    
        best_angle = angles[scores.index(max(scores))]
    
        (h, w) = image.shape[:2]
        center = (w // 2, h // 2)
        M = cv2.getRotationMatrix2D(center, best_angle, 1.0)
        rotated = cv2.warpAffine(image, M, (w, h), flags=cv2.INTER_CUBIC, \
                  borderMode=cv2.BORDER_REPLICATE)
    
        return best_angle, rotated
    
    if __name__ == '__main__':
        image = cv2.imread('1.png')
        angle, rotated = correct_skew(image)
        print(angle)
        cv2.imshow('rotated', rotated)
        cv2.imwrite('rotated.png', rotated)
        cv2.waitKey()
    
    0 讨论(0)
提交回复
热议问题