Scrapy python csv output has blank lines between each row

前端 未结 2 393
自闭症患者
自闭症患者 2021-01-02 23:39

I am getting unwanted blank lines between each row of scrapy output in the resulting csv output file.

I have moved from python2 to python 3, and I use Windows 10. I

相关标签:
2条回答
  • 2021-01-03 00:12

    i suspect not ideal but I have found a work around to this problem. In the pipelines.py file I have added more code that essentially reads the csv file with the blank lines to a list, and so removes the blank lines and then writes that cleaned list to a new file.

    the code I added is:

    with open('%s_items.csv' % spider.name, 'r') as f:
      reader = csv.reader(f)
      original_list = list(reader)
      cleaned_list = list(filter(None,original_list))
    
    with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
        wr = csv.writer(output_file, dialect='excel')
        for data in cleaned_list:
          wr.writerow(data)
    

    and so the entire pipelines.py file is:

    # -*- coding: utf-8 -*-
    
    # Define your item pipelines here
    #
    # Don't forget to add your pipeline to the ITEM_PIPELINES setting
    # See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
    
    import csv
    from scrapy import signals
    from scrapy.exporters import CsvItemExporter
    
    class CSVPipeline(object):
    
      def __init__(self):
        self.files = {}
    
      @classmethod
      def from_crawler(cls, crawler):
        pipeline = cls()
        crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
        crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
        return pipeline
    
      def spider_opened(self, spider):
        file = open('%s_items.csv' % spider.name, 'w+b')
        self.files[spider] = file
        self.exporter = CsvItemExporter(file)
        self.exporter.fields_to_export = ["plotid","plotprice","plotname","name","address"]
        self.exporter.start_exporting()
    
      def spider_closed(self, spider):
        self.exporter.finish_exporting()
        file = self.files.pop(spider)
        file.close()
    
        #given I am using Windows i need to elimate the blank lines in the csv file
        print("Starting csv blank line cleaning")
        with open('%s_items.csv' % spider.name, 'r') as f:
          reader = csv.reader(f)
          original_list = list(reader)
          cleaned_list = list(filter(None,original_list))
    
        with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
            wr = csv.writer(output_file, dialect='excel')
            for data in cleaned_list:
              wr.writerow(data)
    
      def process_item(self, item, spider):
        self.exporter.export_item(item)
        return item
    
    
    class CharleschurchPipeline(object):
        def process_item(self, item, spider):
            return item
    

    not ideal but solves the problem for now.

    0 讨论(0)
  • 2021-01-03 00:16

    The b in w+b is most probably part of the problem as this will make the file being considered a binary file and so linebreaks are written as is.

    So first step is to remove the b. And then by adding U you can also activate the Universal Newline support ( see: https://docs.python.org/3/glossary.html#term-universal-newlines )

    So the line in question should look like:

    file = open('%s_items.csv' % spider.name, 'Uw+')
    
    0 讨论(0)
提交回复
热议问题