python captcha decoder library

后端 未结 3 938
野的像风
野的像风 2020-12-09 07:28

I need a Captcha decoder for python to read simple image captchas like the following picture:

\"captcha\"

相关标签:
3条回答
  • 2020-12-09 07:49

    I hope you are using it in good faith and you are not going to harm (/spam) anyone.

    I won't write you the script nor forward you to an external plugin. But incase you are writing this by your own, this may help:

    • In case you are trying to decode a specific captcha pattern you should collect all chars (I saw from the examples you attached that it's only numbers so it shouldn't be alot of work).
    • Put all of the chars in one file and analyze it with PIL
    • Save in an array each char, its position and its meaning.
    • Get a Captcha image - Clear the background noise if necessary.
    • Split the Captcha image to char-sized and cross it through your self-made dictionary of chars.
    0 讨论(0)
  • 2020-12-09 07:55

    I hope this captcha is not used anywhere.

    Following is a dummy way to decode it. Basically what you need are the patterns from 0 to 9 as present in these captchas. From your examples, I have only the patterns for 0 3 4 5 7 8. Since everything is fixed on them, you know where to split each character. You also know each character is a number of fixed size and fixed font. If it also includes letters or more characters, but of fixed size and font, then the following code can be easily adapted.

    What the code does is: a) load the patterns (I considered they are named n0.png, n1.png, ...); b) split the captcha in NUMS pieces; c) do a sum of squared differences between each pattern and each split number; d) decide that the the split number is the one with the smallest sum. It returns a list for each number, in order, present in the captcha. To obtain the initial patterns, you can uncomment the lines that save the split numbers, place a return after that piece, and adjust the file names.

    import sys
    from PIL import Image, ImageOps
    
    PAT_SIZE = (8, 10)
    NUMS = 3
    FIRST_NUM_OFFSET = 5
    NUM_OFFSET = (1, 3)
    
    
    NUMBERS = []
    for i in xrange(10):
        try:
            NUMBERS.append(Image.open('n%d.png' % i).load())
        except IOError:
            print "I do not know the pattern for the number %d." % i
            NUMBERS.append(None)
    
    
    def magic(fname):
        captcha = ImageOps.grayscale(Image.open(fname))
        im = captcha.load()
    
        # Split numbers
        num = []
        for n in xrange(NUMS):
            x1, y1 = (FIRST_NUM_OFFSET + n * (NUM_OFFSET[0] + PAT_SIZE[0]),
                    NUM_OFFSET[1])
            num.append(captcha.crop((x1, y1, x1 + PAT_SIZE[0], y1 + PAT_SIZE[1])))
    
        # If you want to save the split numbers:
        #for i, n in enumerate(num):
        #    n.save('%d.png' % i)
    
        def sqdiff(a, b):
            if None in (a, b): # XXX This is here just to handle missing pattern.
                return float('inf')
    
            d = 0
            for x in xrange(PAT_SIZE[0]):
                for y in xrange(PAT_SIZE[1]):
                    d += (a[x, y] - b[x, y]) ** 2
            return d
    
        # Calculate a dummy sum of squared differences between the patterns
        # and each number. We assume the smallest diff is the number in the
        # "captcha".
        result = []
        for n in num:
            n_sqdiff = [(sqdiff(p, n.load()), i) for i, p in enumerate(NUMBERS)]
            result.append(min(n_sqdiff)[1])
        return result
    
    print magic(sys.argv[1])
    
    0 讨论(0)
  • 2020-12-09 07:58

    It is a nice project to do for academic reasons, I was interested in this a while ago. You have a few options:

    1. You write your own with the help from this site: http://www.wausita.com/captcha/

    2. You use OpenCV to do the matching.

    If think there was a dedicated libary for neural network image matching but i can't seem to find it.

    Basically as the others said, you want to remove the noise, split into single chars and compare it using a chosen technique to the model chars.

    0 讨论(0)
提交回复
热议问题