NTLM authentication with Scrapy for web scraping

前端 未结 2 826
礼貌的吻别
礼貌的吻别 2021-02-03 12:03

I am attempting to scrape data from a website that requires authentication.
I have been able to successfully login using requests and HttpNtlmAuth with the following:

<
2条回答
  •  情话喂你
    2021-02-03 12:42

    I was able to figure out what was going on.

    1: This is considered a "DOWNLOADER_MIDDLEWARE" not a "SPIDER_MIDDLEWARE".

    DOWNLOADER_MIDDLEWARES = { 'test.ntlmauth.NTLM_Middleware': 400, }
    

    2: The middleware which I was trying to use needed to be modified significantly. Here is what works for me:

    from scrapy.http import Response
    import requests                                                              
    from requests_ntlm import HttpNtlmAuth
    
    class NTLM_Middleware(object):
    
        def process_request(self, request, spider):
            url = request.url
            pwd = getattr(spider, 'http_pass', '')
            usr = getattr(spider, 'http_user', '')
            s = requests.session()     
            response = s.get(url,auth=HttpNtlmAuth(usr,pwd))      
            return Response(url,response.status_code,{}, response.content)
    

    Within the spider, all you need to do is set these variables:

    http_user = 'DOMAIN\\USER'
    http_pass = 'PASS'
    

提交回复
热议问题