Speed up web scraper

前端 未结 4 1677
野趣味
野趣味 2021-01-30 03:45

I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the jo

4条回答
  •  春和景丽
    2021-01-30 04:18

    One workaround to speed up your scrapy is to config your start_urls appropriately.

    For example, If our target data is in http://apps.webofknowledge.com/doc=1 where the doc number range from 1 to 1000, you can config your start_urls in followings:

     start_urls = [
        "http://apps.webofknowledge.com/doc=250",
        "http://apps.webofknowledge.com/doc=750",
    ]
    

    In this way, requests will start from 250 to 251,249 and from 750 to 751,749 simultaneously, so you will get 4 times faster compared to start_urls = ["http://apps.webofknowledge.com/doc=1"].

提交回复
热议问题