Scrapy encounters DEBUG: Crawled (400)

后端 未结 1 738
滥情空心
滥情空心 2021-01-27 00:41

I\'m trying to scrape the page \'https://zhuanlan.zhihu.com/wangzhenotes\' with Scrapy.

I run this command

scrapy shell \'https://zhuanlan.zhihu.com/wangzh         


        
相关标签:
1条回答
  • 2021-01-27 01:11

    Add this middlewire to the middleware.py file -

    class CustomMiddleware(object):
        def process_request(self, request, spider):
            request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
    

    then replace all the previous middlewares with the new one, like this.

    DOWNLOADER_MIDDLEWARES = {
        'projectname.middlewares.CustomMiddleware': 543,
    }
    

    no longer need this -

    DEFAULT_REQUEST_HEADERS = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}
    
    0 讨论(0)
提交回复
热议问题