Speed up web scraper

前端未结

关注

 4  1674

野趣味 2021-01-30 03:45

I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the jo

4条回答

无人共我 (楼主)

2021-01-30 04:08
Here's a collection of things to try:
- use latest scrapy version (if not using already)
- check if non-standard middlewares are used
- try to increase CONCURRENT_REQUESTS_PER_DOMAIN, CONCURRENT_REQUESTS settings (docs)
- turn off logging LOG_ENABLED = False (docs)
- try yielding an item in a loop instead of collecting items into the items list and returning them
- use local cache DNS (see this thread)
- check if this site is using download threshold and limits your download speed (see this thread)
- log cpu and memory usage during the spider run - see if there are any problems there
- try run the same spider under scrapyd service
- see if grequests + lxml will perform better (ask if you need any help with implementing this solution)
- try running Scrapy on pypy, see Running Scrapy on PyPy
Hope that helps.
0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...