Detect if a text image is upside down

后端 未结 4 1611
孤城傲影
孤城傲影 2021-01-31 03:06

I have some hundreds of images (scanned documents), most of them are skewed. I wanted to de-skew them using Python.
Here is the code I used:

import numpy a         


        
4条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-31 03:24

    Assuming you did run the angle-correction already on the image, you can try the following to find out if it is flipped:

    1. Project the corrected image to the y-axis, so that you get a 'peak' for each line. Important: There are actually almost always two sub-peaks!
    2. Smooth this projection by convolving with a gaussian in order to get rid of fine structure, noise, etc.
    3. For each peak, check if the stronger sub-peak is on top or at the bottom.
    4. Calculate the fraction of peaks that have sub-peaks on the bottom side. This is your scalar value that gives you the confidence that the image is oriented correctly.

    The peak finding in step 3 is done by finding sections with above average values. The sub-peaks are then found via argmax.

    Here's a figure to illustrate the approach; A few lines of you example image

    • Blue: Original projection
    • Orange: smoothed projection
    • Horizontal line: average of the smoothed projection for the whole image.

    here's some code that does this:

    import cv2
    import numpy as np
    
    # load image, convert to grayscale, threshold it at 127 and invert.
    page = cv2.imread('Page.jpg')
    page = cv2.cvtColor(page, cv2.COLOR_BGR2GRAY)
    page = cv2.threshold(page, 127, 255, cv2.THRESH_BINARY_INV)[1]
    
    # project the page to the side and smooth it with a gaussian
    projection = np.sum(page, 1)
    gaussian_filter = np.exp(-(np.arange(-3, 3, 0.1)**2))
    gaussian_filter /= np.sum(gaussian_filter)
    smooth = np.convolve(projection, gaussian_filter)
    
    # find the pixel values where we expect lines to start and end
    mask = smooth > np.average(smooth)
    edges = np.convolve(mask, [1, -1])
    line_starts = np.where(edges == 1)[0]
    line_endings = np.where(edges == -1)[0]
    
    # count lines with peaks on the lower side
    lower_peaks = 0
    for start, end in zip(line_starts, line_endings):
        line = smooth[start:end]
        if np.argmax(line) < len(line)/2:
            lower_peaks += 1
    
    print(lower_peaks / len(line_starts))
    

    this prints 0.125 for the given image, so this is not oriented correctly and must be flipped.

    Note that this approach might break badly if there are images or anything not organized in lines in the image (maybe math or pictures). Another problem would be too few lines, resulting in bad statistics.

    Also different fonts might result in different distributions. You can try this on a few images and see if the approach works. I don't have enough data.

提交回复
热议问题