A lot of questions have been already asked about this topic on SO.
(and many others).
Among the numerous answers, none of them was really helpful to me so far. If I missed
Read the csv using the tolerant python csv module, and fix the loaded file prior to handing it off to pandas, which will fails on the otherwise malformed csv data regardless of the csv engine pandas uses.
import pandas as pd
import csv
not_csv = """1,2,3,4,5
1,2,3,4,5,6
,,3,4,5
1,2,3,4,5,6,7
,2,,4
"""
with open('not_a.csv', 'w') as csvfile:
csvfile.write(not_csv)
d = []
with open('not_a.csv') as csvfile:
areader = csv.reader(csvfile)
max_elems = 0
for row in areader:
if max_elems < len(row): max_elems = len(row)
csvfile.seek(0)
for i, row in enumerate(areader):
# fix my csv by padding the rows
d.append(row + ["" for x in range(max_elems-len(row))])
df = pd.DataFrame(d)
print df
# the default engine
# provides "pandas.errors.ParserError: Error tokenizing data. C error: Expected 5 fields in line 2, saw 6 "
#df = pd.read_csv('Test.csv',header=None, engine='c')
# the python csv engine
# provides "pandas.errors.ParserError: Expected 6 fields in line 4, saw 7 "
#df = pd.read_csv('Test.csv',header=None, engine='python')
Preprocess file outside of python if concerned about extra code inside python creating too much python code.
Richs-MBP:tmp randrews$ cat test.csv
1,2,3
1,
2
1,2,
,,,
Richs-MBP:tmp randrews$ awk 'BEGIN {FS=","}; {print $1","$2","$3","$4","$5}' < test.csv
1,2,3,,
1,,,,
2,,,,
1,2,,,
,,,,