How can scrapy export items to separate csv files per item

旧街凉风 提交于 2019-11-28 05:29:30

You approach seems fine to me. Piplines are a great feature of Scrapy and are IMO build for something like your approach.

You could create multiple items (e.g. SoccerItem, MatchItem) and in your MultiCSVItemPipeline just delegate each item to its own CSV class by checking the item class.

Diomedes

I am posting here the code I used to produce a MultiCSVItemPipeline based on the answer of drcolossos above.

This pipeline assumes that all the Item classes follow the convention *Item (e.g. TeamItem, EventItem) and creates team.csv, event.csv files and sends all records to the appropriate csv files.

from scrapy.exporters import CsvItemExporter
from scrapy import signals
from scrapy.xlib.pydispatch import dispatcher


def item_type(item):
    return type(item).__name__.replace('Item','').lower()  # TeamItem => team

class MultiCSVItemPipeline(object):
    SaveTypes = ['team','club','event', 'match']
    def __init__(self):
        dispatcher.connect(self.spider_opened, signal=signals.spider_opened)
        dispatcher.connect(self.spider_closed, signal=signals.spider_closed)

    def spider_opened(self, spider):
        self.files = dict([ (name, open(CSVDir+name+'.csv','w+b')) for name in self.SaveTypes ])
        self.exporters = dict([ (name,CsvItemExporter(self.files[name])) for name in self.SaveTypes])
        [e.start_exporting() for e in self.exporters.values()]

    def spider_closed(self, spider):
        [e.finish_exporting() for e in self.exporters.values()]
        [f.close() for f in self.files.values()]

    def process_item(self, item, spider):
        what = item_type(item)
        if what in set(self.SaveTypes):
            self.exporters[what].export_item(item)
        return item
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!