Align text for OCR

前端 未结 3 1003
萌比男神i
萌比男神i 2021-01-30 10:08

I am creating a database from historical records which I have as photographed pages from books (+100K pages). I wrote some python code to do some image processing before I OCR e

3条回答
  •  一生所求
    2021-01-30 10:36

    Preface: I haven't done much image processing with python. I can give you an image processing suggestion, but you'll have to implement it in Python yourself. All you need is a FFT and a polar transformation (I think OpenCV has an in-built function for that), so that should be straightforward.

    You have only posted one sample image, so I don't know if this works as well for other images, but for this image, a Fourier transform can be very useful: Simply pad the image to a nice power of two (e.g. 2048x2048) and you get a Fourier spectrum like this:

    I've posted a intuitive explanation of the Fourier transform here, but in short: your image can be represented as a series of sin/cosine waves, and most of those "waves" are parallel or perpendicular to the document orientation. That's why you see a strong frequency response at roughly 0°, 90°, 180° and 270°. To measure the exact angle, you could take a polar transform of the Fourier spectrum:

    and simply take the columnwise mean:

    The peak position in that diagram is at 90.835°, and if I rotate the image by -90.835 modulo 90, the orientation looks decent:

    Like I said, I don't have more test images, but it works for rotated versions of your image. At the very least it should narrow down the search space for a more expensive search method.

    Note 1: The FFT is fast, but it obviously takes more time for larger images. And sadly the best way to get a better angle resolution is to use a larger input image (i.e. with more white padding around the source image.)

    Note 2: the FFT actually returns an image where the "DC" (the center in the spectrum image above) is at the origin 0/0. But the rotation property is clearer if you shift it to the center, and it makes the polar transform easier, so I just showed the shifted version.

提交回复
热议问题