NTLM authentication with Scrapy for web scraping

前端未结

关注

 2  826

礼貌的吻别 2021-02-03 12:03

I am attempting to scrape data from a website that requires authentication.
I have been able to successfully login using requests and HttpNtlmAuth with the following:

2条回答

情话喂你 (楼主)

2021-02-03 12:42

I was able to figure out what was going on.

1: This is considered a "DOWNLOADER_MIDDLEWARE" not a "SPIDER_MIDDLEWARE".

DOWNLOADER_MIDDLEWARES = { 'test.ntlmauth.NTLM_Middleware': 400, }

2: The middleware which I was trying to use needed to be modified significantly. Here is what works for me:

from scrapy.http import Response
import requests                                                              
from requests_ntlm import HttpNtlmAuth

class NTLM_Middleware(object):

    def process_request(self, request, spider):
        url = request.url
        pwd = getattr(spider, 'http_pass', '')
        usr = getattr(spider, 'http_user', '')
        s = requests.session()     
        response = s.get(url,auth=HttpNtlmAuth(usr,pwd))      
        return Response(url,response.status_code,{}, response.content)

Within the spider, all you need to do is set these variables:

http_user = 'DOMAIN\\USER'
http_pass = 'PASS'

0 讨论(0)

查看其它2个回答