How to run several versions of one single spider at one time with Scrapy?

后端未结

关注

 2  1500

清酒与你

My problematic is the following:

To win time, I would like to run several versions of one single spider. The process (parsing definitions) is the sa

相关标签:

2条回答

情歌与酒

2020-12-22 01:27

So, I find a solution inspired of the scrapy crawl -a variable=value

The spider concerned, in "spiders" folder was transformed:

class MySpider(scrapy.Spider):
name = "arg"
allowed_domains = ['www.website.com']

    def __init__ (self, lo_lim=None, up_lim=None , type_of_race = None) : #lo_lim = 2017 , up_lim = 2019, type_of_race = pmu
        year  = range(int(lo_lim), int(up_lim)) # lower limit, upper limit, must be convert to integer type, instead this is string type
        month = range(1,13) #12 months
        day   = range(1,32) #31 days
        url   = []
        for y in year:
            for m in month:
                for d in day:
                    url.append("https://www.website.com/details/{}-{}-{}/{}/meeting".format(y,m,d,type_of_race))

        self.start_urls = url #where url = ["https://www.website.com/details/2017-1-1/pmu/meeting",
                                        #"https://www.website.com/details/2017-1-2/pmu/meeting",
                                        #...
                                        #"https://www.website.com/details/2017-12-31/pmu/meeting"
                                        #"https://www.website.com/details/2018-1-1/pmu/meeting",
                                        #"https://www.website.com/details/2018-1-2/pmu/meeting",
                                        #...
                                        #"https://www.website.com/details/2018-12-31/pmu/meeting"]

    def parse(self, response):
        ...`

Then, it answers to my problematic: to keep one single spider, and to run several versions of it by serveral commands at one time without trouble.

Without a def __init__ it didn't work for me. I tried a lot of ways, that is this perfectible code that works for me.

Scrapy version: 1.5.0, Python version: 2.7.9, Mongodb version: 3.6.4, Pymongo version: 3.6.1

0 讨论(0)

鱼传尺愫

2020-12-22 01:53
Scrapy supports spider arguments. Weirdly enough there's no straightforward documentation, but I'll try to fill in:

When you run a crawl command you may provide -a NAME=VALUE arguments and these will be set as your spider class instance variables. For example:
```
class MySpider(Spider):
    name = 'arg'
    # we will set below when running the crawler
    foo = None 
    bar = None

    def start_requests(self):
        url = f'http://example.com/{self.foo}/{self.bar}'
        yield Request(url)
```
And if we run it:
```
scrapy crawl arg -a foo=1 -a bar=2
# will crawl example.com/1/2
```
0 讨论(0)
发布评论:

提交评论
- 加载中...