HTTP 403 Responses when using Python Scrapy

后端未结

关注

 2  776

I am using Python.org version 2.7 64 bit on Windows Vista 64 bit. I have been testing the following Scrapy code to recursively scrape all the pages at the site www.whoscored

相关标签:

2条回答

一个人的身影

2020-12-31 14:44

I do not if this still available, but I have to put the next lines in the setting.py file:

HTTPERROR_ALLOWED_CODES  =[404]
USER_AGENT = 'quotesbot (+http://www.yourdomain.com)'
USER_AGENT = "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.131 Safari/537.36"

hope it helps.

0 讨论(0)

栀梦

2020-12-31 14:49

HTTP Status Code 403 definitely means Forbidden / Access Denied.
HTTP Status Code 302 is for redirection of requests. No need to worry about them.
Nothing seems to be wrong in your code.

Yes, it's definitely an anti-scraping measure implemented by the site.

Refer these guidelines from Scrapy Docs: Avoid Getting Banned

Also, you should consider pausing and resuming crawls.

0 讨论(0)
发布评论:

提交评论
- 加载中...