Removing the background noise of a captcha image by replicating the chopping filter of TesserCap

后端 未结 2 780
梦毁少年i
梦毁少年i 2021-02-01 10:12

I have a captcha image that looks like this:

\"\"

Using a utility called TesserCap from McAfee, I could app

相关标签:
2条回答
  • 2021-02-01 10:32

    Try something like this (pseudocode):

    for each row of pixels:
        if there is a group of about 3 or more pixels in a row, leave them
        else remove the pixels
    

    Then simply repeat the same thing for the columns. Seems like it might work at least a little. Going both horizontally and vertically like this will remove horizontal/vertical lines as well.

    0 讨论(0)
  • 2021-02-01 10:35

    The algorithm essentially checks if there are multiple target pixels (in this case, non-white pixels) in a row, and changes those pixels if the number of pixels is less than or equal to the chop factor.

    For example, in a sample row of pixels, where # is black and - is white, applying a chop factor of 2 would transform --#--###-##---#####---#-# into ------###-------#####-------. This is because there sequences of black pixels that are smaller than or equal to 2 pixels, and these sequences are replaced with white. The continuous sequences of greater than 2 pixels remain.

    This is the result of the chop algorithm as implemented in my Python code (below) on the original image on your post:

    'Chopped' image

    In order to apply this to the whole image, you simply perform this algorithm on every row and on every column. Here's Python code that accomplishes that:

    import PIL.Image
    import sys
    
    # python chop.py [chop-factor] [in-file] [out-file]
    
    chop = int(sys.argv[1])
    image = PIL.Image.open(sys.argv[2]).convert('1')
    width, height = image.size
    data = image.load()
    
    # Iterate through the rows.
    for y in range(height):
        for x in range(width):
    
            # Make sure we're on a dark pixel.
            if data[x, y] > 128:
                continue
    
            # Keep a total of non-white contiguous pixels.
            total = 0
    
            # Check a sequence ranging from x to image.width.
            for c in range(x, width):
    
                # If the pixel is dark, add it to the total.
                if data[c, y] < 128:
                    total += 1
    
                # If the pixel is light, stop the sequence.
                else:
                    break
    
            # If the total is less than the chop, replace everything with white.
            if total <= chop:
                for c in range(total):
                    data[x + c, y] = 255
    
            # Skip this sequence we just altered.
            x += total
    
    
    # Iterate through the columns.
    for x in range(width):
        for y in range(height):
    
            # Make sure we're on a dark pixel.
            if data[x, y] > 128:
                continue
    
            # Keep a total of non-white contiguous pixels.
            total = 0
    
            # Check a sequence ranging from y to image.height.
            for c in range(y, height):
    
                # If the pixel is dark, add it to the total.
                if data[x, c] < 128:
                    total += 1
    
                # If the pixel is light, stop the sequence.
                else:
                    break
    
            # If the total is less than the chop, replace everything with white.
            if total <= chop:
                for c in range(total):
                    data[x, y + c] = 255
    
            # Skip this sequence we just altered.
            y += total
    
    image.save(sys.argv[3])
    
    0 讨论(0)
提交回复
热议问题