In csv, for a column there is ambiguity in string. Because of that, I\'m getting 6
values in list instead of 5
values as output.
Code:<
This is almost embarrassingly hacky, but seems to work at least on the sample input shown in your question. It works by post-processing each row read by the csvreader
and tries to detect when they have been read incorrectly due to the bad formatting — and then corrects it.
import csv
def read_csv(filename):
with open(filename, 'rb') as file:
for row in csv.reader(file, skipinitialspace=True, quotechar=None):
newrow = []
use_a = True
for a, b in zip(row, row[1:]):
# Detect bad formatting.
if (a.startswith('"') and not a.endswith('"')
and not b.startswith('"') and b.endswith('"')):
# Join misread field backs together.
newrow.append(', '.join((a,b)))
use_a = False
else:
if use_a:
newrow.append(a)
else:
newrow.append(b)
use_a = True
yield [field.replace('""', '"').strip('"') for field in newrow]
for row in read_csv('fmt_test2.csv'):
print(row)
Output:
['1', '2', 'text1', 'Sample text "present" in csv, as this', '5']