I\'m trying to scrape the page \'https://zhuanlan.zhihu.com/wangzhenotes\' with Scrapy.
I run this command
scrapy shell \'https://zhuanlan.zhihu.com/wangzh
Add this middlewire to the middleware.py
file -
class CustomMiddleware(object):
def process_request(self, request, spider):
request.headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36"
then replace all the previous middlewares with the new one, like this.
DOWNLOADER_MIDDLEWARES = {
'projectname.middlewares.CustomMiddleware': 543,
}
no longer need this -
DEFAULT_REQUEST_HEADERS = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36'}