How to Bypass Google Recaptcha while scraping with Requests

前端 未结 1 1781
悲哀的现实
悲哀的现实 2021-02-02 04:27

Python code to request the URL:

agent = {\"User-Agent\":\'Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/         


        
1条回答
  •  北恋
    北恋 (楼主)
    2021-02-02 05:21

    Using Google Cache along with a referer (in the header) will help you bypass the captcha.
    Things to note:

    • Don't send more than 2 requests/sec. You may get blocked.
    • The result you receive is a cache. This will not be effective if you are trying to scrape a real-time data.
      Example:
    header = {
        "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
        'referer':'https://www.google.com/'
    }
    
    r = requests.get("http://webcache.googleusercontent.com/search?q=cache:www.naukri.com/jobs-in-andhra-pradesh",headers=header)
    

    This gives:

    >>> r.content
    [Squeezed 2554 lines]
    

    0 讨论(0)
提交回复
热议问题