发表新帖

发表新帖

CPU-intensive parsing with scrapy

前端未结

关注

 2  529

[愿得一人] 2021-01-15 00:34

The CONCURRENT_ITEMS section at http://doc.scrapy.org/en/latest/topics/settings.html#concurrent-items defines it as:

Maximum number of concurrent item

2条回答

孤街浪徒 (楼主)

2021-01-15 01:30

The Requests system also works in parallel, see http://doc.scrapy.org/en/latest/topics/settings.html#concurrent-requests. Scrapy is designed to handle requesting and parsing in the spider itself, the callback methods make it asynchronous and by default multiple Requests work in parallel indeed.

The item pipeline, which does process in parallel, isn't intended to do heavy parsing: it is rather meant to check and validate the values you got in each item. (http://doc.scrapy.org/en/latest/topics/item-pipeline.html)

Therefore you should do your queries in the spider itself, as they are designed to be there. From the docs on spiders:

Spiders are classes which define how a certain site (or group of sites) will be scraped, including how to perform the crawl (ie. follow links) and how to extract structured data from their pages (ie. scraping items).

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题