Speed up web scraper

前端 未结 4 1676
野趣味
野趣味 2021-01-30 03:45

I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the jo

4条回答
  •  长情又很酷
    2021-01-30 03:58

    I work also on web scrapping, using optimized C#, and it ends up CPU bound, so I am switching to C.

    Parsing HTML blows the CPU data cache, and pretty sure your CPU is not using SSE 4.2 at all, as you can only access this feature using C/C++.

    If you do the math, you are quickly compute bound but not memory bound.

提交回复
热议问题