Scrapy with dynamic captcha

前端 未结 1 1605
北荒
北荒 2021-01-21 12:18

I\'m trying to break a captcha within a form from a website, but this captcha is dynamic, it doesn\'t have a URL instead it has something like this

         


        
相关标签:
1条回答
  • 2021-01-21 13:02

    Here's a complete solution to bypass the specified captcha using anticaptcha and PIL.

    Due to the dynamic of this captcha, we need to grab a print screen of the img element containing the captcha. For that we use save_screenshot() and PIL to crop and save <img name="imagen"... to disk (captcha.png).
    We then submit captcha.png to anti-captcha that will return the solution, i.e.:

    from PIL import Image
    from python_anticaptcha import AnticaptchaClient, ImageToTextTask
    from selenium import webdriver
    
    def get_captcha():
        captcha_fn = "captcha.png"
        element = driver.find_element_by_name("imagen") # element name containing the catcha image
        location = element.location
        size = element.size
        driver.save_screenshot("temp.png")
    
        x = location['x']
        y = location['y']
        w = size['width']
        h = size['height']
        width = x + w
        height = y + h
    
        im = Image.open('temp.png')
        im = im.crop((int(x), int(y), int(width), int(height)))
        im.save(captcha_fn)
    
        # request anti-captcha service to decode the captcha
    
        api_key = 'XXXXXXXXXXXXXXXXXXXXXXXXXX' # api key -> https://anti-captcha.com/
        captcha_fp = open(captcha_fn, 'rb')
        client = AnticaptchaClient(api_key)
        task = ImageToTextTask(captcha_fp)
        job = client.createTask(task)
        job.join()
        return job.get_captcha_text()
    
    start_url = "YOU KNOW THE URL"
    driver = webdriver.Chrome()
    driver.get(start_url)
    captcha = get_captcha()
    print( captcha )
    

    Output:

    ifds
    

    captcha.png


    Notes:

    • Use it at your own responsibility (be smart);
    • You can improve the code by handling exceptions properly;
    • anticaptcha is a paid service (0.5$/1000 imgs);
    • I'm not affiliated with anticaptcha.
    0 讨论(0)
提交回复
热议问题