Joining all rows of a CSV file that have the same 1st column value in Python

后端 未结 3 515
后悔当初
后悔当初 2021-01-03 16:27

I have a CSV file that goes something like this:

[\'Name1\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\', \'\',

相关标签:
3条回答
  • 2021-01-03 16:31

    You should use itertools.groupby:

    t = [ 
    ['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+'],
    ['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', ''],
    ['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', ''],
    ['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', ''] 
    ]
    
    from itertools import groupby
    
    # TODO: if you need to speed things up you can use operator.itemgetter
    # for both sorting and grouping
    for name, rows in groupby(sorted(t), lambda x:x[0]):
        print join_rows(rows)
    

    It's obvious that you'd implement the merging in a separate function. For example like this:

    def join_rows(rows):
        def join_tuple(tup):
            for x in tup:
                if x: 
                    return x
            else:
                return ''
        return [join_tuple(x) for x in zip(*rows)]
    
    0 讨论(0)
  • 2021-01-03 16:34

    You can also use defaultdict:

    >>> from collections import defaultdict
    >>> d = defaultdict(list)
    >>> _ = [d[i[0]].append(z) for i in t for z in i[1:]]
    >>> d['Name1']
    ['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    

    Then do your column joining

    0 讨论(0)
  • 2021-01-03 16:52
    def merge_rows(row1, row2):
        # merge two rows with the same name
        merged_row = ...
        return merged_row
    
    r1 = ['Name1', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '+']
    r2 = ['Name1', '', '', '', '', '', 'b', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    r3 = ['Name2', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', 'a', '']
    r4 = ['Name3', '', '', '', '', '+', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']
    rows = [r1, r2, r3, r4]
    data = {}
    for row in rows:
        name = row[0]
        if name in data:
            data[name] = merge_rows(row, data[name])
        else:
            data[name] = row
    

    You now have all the rows in data where each key of this dictionary is the name and the corresponding value is that row. You can now write this data to a CSV file.

    0 讨论(0)
提交回复
热议问题