How to read wrongly formatted string in csv properly?

后端 未结 1 1183
被撕碎了的回忆
被撕碎了的回忆 2021-01-29 05:06

In csv, for a column there is ambiguity in string. Because of that, I\'m getting 6 values in list instead of 5 values as output.

Code:<

1条回答
  •  感情败类
    2021-01-29 06:03

    This is almost embarrassingly hacky, but seems to work at least on the sample input shown in your question. It works by post-processing each row read by the csvreader and tries to detect when they have been read incorrectly due to the bad formatting — and then corrects it.

    import csv
    
    def read_csv(filename):
        with open(filename, 'rb') as file:
            for row in csv.reader(file, skipinitialspace=True, quotechar=None):
                newrow = []
                use_a = True
                for a, b in zip(row, row[1:]):
                    # Detect bad formatting.
                    if (a.startswith('"') and not a.endswith('"')
                            and not b.startswith('"') and b.endswith('"')):
                        # Join misread field backs together.
                        newrow.append(', '.join((a,b)))
                        use_a = False
                    else:
                        if use_a:
                            newrow.append(a)
                        else:
                            newrow.append(b)
                            use_a = True
                yield [field.replace('""', '"').strip('"') for field in newrow]
    
    for row in read_csv('fmt_test2.csv'):
        print(row)
    

    Output:

    ['1', '2', 'text1', 'Sample text "present" in csv, as this', '5']
    

    0 讨论(0)
提交回复
热议问题