Speed up web scraper

前端未结

关注

 4  1670

野趣味 2021-01-30 03:45

I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the jo

4条回答

迷失自我 (楼主)

2021-01-30 04:00

Looking at your code, I'd say most of that time is spent in network requests rather than processing the responses. All of the tips @alecxe provides in his answer apply, but I'd suggest the HTTPCACHE_ENABLED setting, since it caches the requests and avoids doing it a second time. It would help on following crawls and even offline development. See more info in the docs: http://doc.scrapy.org/en/latest/topics/downloader-middleware.html#module-scrapy.contrib.downloadermiddleware.httpcache

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...