Removing the background noise of a captcha image by replicating the chopping filter of TesserCap

后端未结

关注

 2  782

梦毁少年i 2021-02-01 10:12

I have a captcha image that looks like this:

$\"\"$

Using a utility called TesserCap from McAfee, I could app

2条回答

隐瞒了意图╮ (楼主)

2021-02-01 10:35

The algorithm essentially checks if there are multiple target pixels (in this case, non-white pixels) in a row, and changes those pixels if the number of pixels is less than or equal to the chop factor.

For example, in a sample row of pixels, where # is black and - is white, applying a chop factor of 2 would transform --#--###-##---#####---#-# into ------###-------#####-------. This is because there sequences of black pixels that are smaller than or equal to 2 pixels, and these sequences are replaced with white. The continuous sequences of greater than 2 pixels remain.

This is the result of the chop algorithm as implemented in my Python code (below) on the original image on your post:

'Chopped' image

In order to apply this to the whole image, you simply perform this algorithm on every row and on every column. Here's Python code that accomplishes that:

import PIL.Image
import sys

# python chop.py [chop-factor] [in-file] [out-file]

chop = int(sys.argv[1])
image = PIL.Image.open(sys.argv[2]).convert('1')
width, height = image.size
data = image.load()

# Iterate through the rows.
for y in range(height):
    for x in range(width):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from x to image.width.
        for c in range(x, width):

            # If the pixel is dark, add it to the total.
            if data[c, y] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x + c, y] = 255

        # Skip this sequence we just altered.
        x += total


# Iterate through the columns.
for x in range(width):
    for y in range(height):

        # Make sure we're on a dark pixel.
        if data[x, y] > 128:
            continue

        # Keep a total of non-white contiguous pixels.
        total = 0

        # Check a sequence ranging from y to image.height.
        for c in range(y, height):

            # If the pixel is dark, add it to the total.
            if data[x, c] < 128:
                total += 1

            # If the pixel is light, stop the sequence.
            else:
                break

        # If the total is less than the chop, replace everything with white.
        if total <= chop:
            for c in range(total):
                data[x, y + c] = 255

        # Skip this sequence we just altered.
        y += total

image.save(sys.argv[3])

0 讨论(0)

查看其它2个回答