Scrapy not crawling subsequent pages in order

后端 未结 1 1412
野的像风
野的像风 2020-12-21 07:04

I am writing a crawler to get the names of items from an website. The website has got 25 items per page and multiple pages (200 for some item types).

Here is the co

相关标签:
1条回答
  • 2020-12-21 07:38

    scrapy is an asynchronous framework. It uses non-blocking IO, so it doesn't wait for a request to finish before starting the next one.

    And since multiple requests can be made at a time, it is impossible to know the exact order the parse() method will be getting the responses.

    My point is, scrapy is not meant to extract data in a particular order. If you absolutely need to preserve order, there are some ideas here: Scrapy Crawl URLs in Order

    0 讨论(0)
提交回复
热议问题