I am scraping 23770 webpages with a pretty simple web scraper using scrapy
. I am quite new to scrapy and even python, but managed to write a spider that does the jo
Here's a collection of things to try:
CONCURRENT_REQUESTS_PER_DOMAIN
, CONCURRENT_REQUESTS
settings (docs)LOG_ENABLED = False
(docs)yield
ing an item in a loop instead of collecting items into the items
list and returning themScrapy
on pypy
, see Running Scrapy on PyPyHope that helps.