How can i export scraped data to csv file in the right format?

前端未结

关注

 1  968

I made an improvement to my code according to this suggestion from @paultrmbrth. what i need is to scrape data from pages that are similar to this and this one and i want th

相关标签:

1条回答

面向向阳花

2021-01-15 05:19

You can extract the title using below

item = {}
item['Title'] = response.css("h3[itemprop='name'] a::text").extract_first()

For the CSV part you would need to create a FeedExports which can split each row into multiple rows

from itertools import zip_longest
from scrapy.contrib.exporter import CsvItemExporter


class NewLineRowCsvItemExporter(CsvItemExporter):

    def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
        super(NewLineRowCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs)

    def export_item(self, item):
        if self._headers_not_written:
            self._headers_not_written = False
            self._write_headers_and_set_fields_to_export(item)

        fields = self._get_serialized_fields(item, default_value='',
                                             include_empty=True)
        values = list(self._build_row(x for _, x in fields))

        values = [
            (val[0] if len(val) == 1 and type(val[0]) in (list, tuple) else val)
            if type(val) in (list, tuple)
            else (val, )
            for val in values]

        multi_row = zip_longest(*values, fillvalue='')

        for row in multi_row:
            self.csv_writer.writerow(row)

Then you need to assign the feed exporter in your settings

FEED_EXPORTERS = {
    'csv': '<yourproject>.exporters.NewLineRowCsvItemExporter',
}

Assuming you put the code in exporters.py file. The output will be as desired

Edit-1

To set the fields and their order you will need to define FEED_EXPORT_FIELDS in your settings.py

FEED_EXPORT_FIELDS = ['Title', 'Follows', 'Followed by', 'Edited into', 'Spun-off from', 'Spin-off', 'Referenced in',
                       'Featured in', 'Spoofed in', 'References', 'Spoofs', 'Version of', 'Remade as', 'Edited from',
                       'Features']

https://doc.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_FIELDS

0 讨论(0)