I am scraping 23770 webpages with a pretty simple web scraper using scrapy
. I am quite new to scrapy and even python, but managed to write a spider that does the jo
One workaround to speed up your scrapy is to config your start_urls
appropriately.
For example, If our target data is in http://apps.webofknowledge.com/doc=1
where the doc number range from 1
to 1000
, you can config your start_urls
in followings:
start_urls = [
"http://apps.webofknowledge.com/doc=250",
"http://apps.webofknowledge.com/doc=750",
]
In this way, requests will start from 250
to 251,249
and from 750
to 751,749
simultaneously, so you will get 4 times faster compared to start_urls = ["http://apps.webofknowledge.com/doc=1"]
.