Pandas CSV file with occasional extra columns in the middle

前端未结

关注

 1  1069

孤街浪徒 2021-01-21 00:25

I\'m processing lots (thousands) of ~100k line csv files that are produced by someone else. 9 times out of 10 the files have 8 columns and all is right with the world. The 10th

1条回答

野趣味 (楼主)

2021-01-21 01:17

If you want to drop the bad lines, you might be able to use error_bad_lines=False (and warn_bad_lines = False if you want it to be quiet about it):

>>> !cat unclean.csv
A,B,C,D,E,F,G,H
A,B,C,D,E,F,G,H
A,B,C,D,E,F,Foo,Bar,G,H
A,B,C,D,E,F,G,H
A,B,C,D,E,F,Foo,Bar,G,H
A,B,C,D,E,F,G,H
A,B,C,D,E,F,G,H
>>> df = pd.read_csv("unclean.csv", error_bad_lines=False, header=None)
Skipping line 3: expected 8 fields, saw 10
Skipping line 5: expected 8 fields, saw 10

>>> df
   0  1  2  3  4  5  6  7
0  A  B  C  D  E  F  G  H
1  A  B  C  D  E  F  G  H
2  A  B  C  D  E  F  G  H
3  A  B  C  D  E  F  G  H
4  A  B  C  D  E  F  G  H

0 讨论(0)