How can scrapy export items to separate csv files per item

后端 未结 3 616
隐瞒了意图╮
隐瞒了意图╮ 2020-12-08 05:02

I am scraping a soccer site and the spider (a single spider) gets several kinds of items from the site\'s pages: Team, Match, Club etc. I am trying to use the CSVItemExporte

相关标签:
3条回答
  • 2020-12-08 05:45

    I am posting here the code I used to produce a MultiCSVItemPipeline based on the answer of drcolossos above.

    This pipeline assumes that all the Item classes follow the convention *Item (e.g. TeamItem, EventItem) and creates team.csv, event.csv files and sends all records to the appropriate csv files.

    from scrapy.exporters import CsvItemExporter
    from scrapy import signals
    from scrapy.xlib.pydispatch import dispatcher
    
    
    def item_type(item):
        return type(item).__name__.replace('Item','').lower()  # TeamItem => team
    
    class MultiCSVItemPipeline(object):
        SaveTypes = ['team','club','event', 'match']
        def __init__(self):
            dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
            dispatcher.connect(self.spider_closed, signal=signals.spider_closed)
    
        def spider_opened(self, spider):
            self.files = dict([ (name, open(CSVDir+name+'.csv','w+b')) for name in self.SaveTypes ])
            self.exporters = dict([ (name,CsvItemExporter(self.files[name])) for name in self.SaveTypes])
            [e.start_exporting() for e in self.exporters.values()]
    
        def spider_closed(self, spider):
            [e.finish_exporting() for e in self.exporters.values()]
            [f.close() for f in self.files.values()]
    
        def process_item(self, item, spider):
            what = item_type(item)
            if what in set(self.SaveTypes):
                self.exporters[what].export_item(item)
            return item
    
    0 讨论(0)
  • 2020-12-08 05:57

    I have tried the answer. It seems do not work in the latest version (2.21).

    I have included my code for your reference:

    class MultiCSVItemPipeline(object):
        SaveTypes = ['CentalineTransactionsItem','CentalineTransactionsDetailItem','CentalineBuildingInfo']
    
        def open_spider(self, spider):
            self.files = dict([ (name, open(name+'.csv','w+b')) for name in self.SaveTypes ])
            self.exporters = dict([ (name,CsvItemExporter(self.files[name])) for name in self.SaveTypes])
            [e.start_exporting() for e in self.exporters.values()]
    
        def close_spider(self, spider):
            [e.finish_exporting() for e in self.exporters.values()]
            [f.close() for f in self.files.values()]
    
        def process_item(self, item, spider):
            what = type(item).__name__
            if what in set(self.SaveTypes):
                self.exporters[what].export_item(item)
            return item
        
    
    0 讨论(0)
  • 2020-12-08 06:01

    You approach seems fine to me. Piplines are a great feature of Scrapy and are IMO build for something like your approach.

    You could create multiple items (e.g. SoccerItem, MatchItem) and in your MultiCSVItemPipeline just delegate each item to its own CSV class by checking the item class.

    0 讨论(0)
提交回复
热议问题