This is how my spider is set up
class CustomSpider(CrawlSpider):
name = \'custombot\'
allowed_domains = [\'www.domain.com\']
start_urls = [\'http
From the documentation for start_requests, overriding start_requests
means that the urls defined in start_urls
are ignored.
This is the method called by Scrapy when the spider is opened for scraping when no particular URLs are specified. If particular URLs are specified, the make_requests_from_url() is used instead to create the Requests.
[...]
If you want to change the Requests used to start scraping a domain, this is the method to override.
If you want to just scrape from /some-url, then remove start_requests
. If you want to scrape from both, then add /some-url to the start_urls
list.