How can i export scraped data to csv file in the right format?

前端 未结 1 967
有刺的猬
有刺的猬 2021-01-15 05:12

I made an improvement to my code according to this suggestion from @paultrmbrth. what i need is to scrape data from pages that are similar to this and this one and i want th

相关标签:
1条回答
  • 2021-01-15 05:19

    You can extract the title using below

    item = {}
    item['Title'] = response.css("h3[itemprop='name'] a::text").extract_first()
    

    For the CSV part you would need to create a FeedExports which can split each row into multiple rows

    from itertools import zip_longest
    from scrapy.contrib.exporter import CsvItemExporter
    
    
    class NewLineRowCsvItemExporter(CsvItemExporter):
    
        def __init__(self, file, include_headers_line=True, join_multivalued=',', **kwargs):
            super(NewLineRowCsvItemExporter, self).__init__(file, include_headers_line, join_multivalued, **kwargs)
    
        def export_item(self, item):
            if self._headers_not_written:
                self._headers_not_written = False
                self._write_headers_and_set_fields_to_export(item)
    
            fields = self._get_serialized_fields(item, default_value='',
                                                 include_empty=True)
            values = list(self._build_row(x for _, x in fields))
    
            values = [
                (val[0] if len(val) == 1 and type(val[0]) in (list, tuple) else val)
                if type(val) in (list, tuple)
                else (val, )
                for val in values]
    
            multi_row = zip_longest(*values, fillvalue='')
    
            for row in multi_row:
                self.csv_writer.writerow(row)
    

    Then you need to assign the feed exporter in your settings

    FEED_EXPORTERS = {
        'csv': '<yourproject>.exporters.NewLineRowCsvItemExporter',
    }
    

    Assuming you put the code in exporters.py file. The output will be as desired

    Edit-1

    To set the fields and their order you will need to define FEED_EXPORT_FIELDS in your settings.py

    FEED_EXPORT_FIELDS = ['Title', 'Follows', 'Followed by', 'Edited into', 'Spun-off from', 'Spin-off', 'Referenced in',
                           'Featured in', 'Spoofed in', 'References', 'Spoofs', 'Version of', 'Remade as', 'Edited from',
                           'Features']
    

    https://doc.scrapy.org/en/latest/topics/feed-exports.html#std:setting-FEED_EXPORT_FIELDS

    0 讨论(0)
提交回复
热议问题