Scrapy: What's the correct way to use start_requests()?

前端 未结 1 1557
暖寄归人
暖寄归人 2021-01-02 07:07

This is how my spider is set up

class CustomSpider(CrawlSpider):
    name = \'custombot\'
    allowed_domains = [\'www.domain.com\']
    start_urls = [\'http         


        
相关标签:
1条回答
  • 2021-01-02 07:07

    From the documentation for start_requests, overriding start_requests means that the urls defined in start_urls are ignored.

    This is the method called by Scrapy when the spider is opened for scraping when no particular URLs are specified. If particular URLs are specified, the make_requests_from_url() is used instead to create the Requests.
    [...]
    If you want to change the Requests used to start scraping a domain, this is the method to override.

    If you want to just scrape from /some-url, then remove start_requests. If you want to scrape from both, then add /some-url to the start_urls list.

    0 讨论(0)
提交回复
热议问题