How to read wrongly formatted string in csv properly?

后端未结

关注

 1  1182

In csv, for a column there is ambiguity in string. Because of that, I\'m getting 6 values in list instead of 5 values as output.

Code:<

相关标签:

1条回答

感情败类

2021-01-29 06:03

This is almost embarrassingly hacky, but seems to work at least on the sample input shown in your question. It works by post-processing each row read by the csvreader and tries to detect when they have been read incorrectly due to the bad formatting — and then corrects it.

import csv def read_csv(filename): with open(filename, 'rb') as file: for row in csv.reader(file, skipinitialspace=True, quotechar=None): newrow = [] use_a = True for a, b in zip(row, row[1:]): # Detect bad formatting. if (a.startswith('"') and not a.endswith('"') and not b.startswith('"') and b.endswith('"')): # Join misread field backs together. newrow.append(', '.join((a,b))) use_a = False else: if use_a: newrow.append(a) else: newrow.append(b) use_a = True yield [field.replace('""', '"').strip('"') for field in newrow] for row in read_csv('fmt_test2.csv'): print(row)

Output:

['1', '2', 'text1', 'Sample text "present" in csv, as this', '5']

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复