I am getting unwanted blank lines between each row of scrapy output in the resulting csv output file.
I have moved from python2 to python 3, and I use Windows 10. I
i suspect not ideal but I have found a work around to this problem. In the pipelines.py file I have added more code that essentially reads the csv file with the blank lines to a list, and so removes the blank lines and then writes that cleaned list to a new file.
the code I added is:
with open('%s_items.csv' % spider.name, 'r') as f:
reader = csv.reader(f)
original_list = list(reader)
cleaned_list = list(filter(None,original_list))
with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
wr = csv.writer(output_file, dialect='excel')
for data in cleaned_list:
wr.writerow(data)
and so the entire pipelines.py file is:
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: http://doc.scrapy.org/en/latest/topics/item-pipeline.html
import csv
from scrapy import signals
from scrapy.exporters import CsvItemExporter
class CSVPipeline(object):
def __init__(self):
self.files = {}
@classmethod
def from_crawler(cls, crawler):
pipeline = cls()
crawler.signals.connect(pipeline.spider_opened, signals.spider_opened)
crawler.signals.connect(pipeline.spider_closed, signals.spider_closed)
return pipeline
def spider_opened(self, spider):
file = open('%s_items.csv' % spider.name, 'w+b')
self.files[spider] = file
self.exporter = CsvItemExporter(file)
self.exporter.fields_to_export = ["plotid","plotprice","plotname","name","address"]
self.exporter.start_exporting()
def spider_closed(self, spider):
self.exporter.finish_exporting()
file = self.files.pop(spider)
file.close()
#given I am using Windows i need to elimate the blank lines in the csv file
print("Starting csv blank line cleaning")
with open('%s_items.csv' % spider.name, 'r') as f:
reader = csv.reader(f)
original_list = list(reader)
cleaned_list = list(filter(None,original_list))
with open('%s_items_cleaned.csv' % spider.name, 'w', newline='') as output_file:
wr = csv.writer(output_file, dialect='excel')
for data in cleaned_list:
wr.writerow(data)
def process_item(self, item, spider):
self.exporter.export_item(item)
return item
class CharleschurchPipeline(object):
def process_item(self, item, spider):
return item
not ideal but solves the problem for now.
The b
in w+b
is most probably part of the problem as this will make the file being considered a binary file and so linebreaks are written as is.
So first step is to remove the b
. And then by adding U
you can also activate the Universal Newline support ( see: https://docs.python.org/3/glossary.html#term-universal-newlines )
So the line in question should look like:
file = open('%s_items.csv' % spider.name, 'Uw+')