Speed up web scraper

前端未结

关注

 4  1676

野趣味 2021-01-30 03:45

I am scraping 23770 webpages with a pretty simple web scraper using scrapy. I am quite new to scrapy and even python, but managed to write a spider that does the jo

4条回答

长情又很酷 (楼主)

2021-01-30 03:58

I work also on web scrapping, using optimized C#, and it ends up CPU bound, so I am switching to C.

Parsing HTML blows the CPU data cache, and pretty sure your CPU is not using SSE 4.2 at all, as you can only access this feature using C/C++.

If you do the math, you are quickly compute bound but not memory bound.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...