Pandas CSV file with occasional extra columns in the middle

前端 未结 1 1067
孤街浪徒
孤街浪徒 2021-01-21 00:25

I\'m processing lots (thousands) of ~100k line csv files that are produced by someone else. 9 times out of 10 the files have 8 columns and all is right with the world. The 10th

1条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-21 01:17

    If you want to drop the bad lines, you might be able to use error_bad_lines=False (and warn_bad_lines = False if you want it to be quiet about it):

    >>> !cat unclean.csv
    A,B,C,D,E,F,G,H
    A,B,C,D,E,F,G,H
    A,B,C,D,E,F,Foo,Bar,G,H
    A,B,C,D,E,F,G,H
    A,B,C,D,E,F,Foo,Bar,G,H
    A,B,C,D,E,F,G,H
    A,B,C,D,E,F,G,H
    >>> df = pd.read_csv("unclean.csv", error_bad_lines=False, header=None)
    Skipping line 3: expected 8 fields, saw 10
    Skipping line 5: expected 8 fields, saw 10
    
    >>> df
       0  1  2  3  4  5  6  7
    0  A  B  C  D  E  F  G  H
    1  A  B  C  D  E  F  G  H
    2  A  B  C  D  E  F  G  H
    3  A  B  C  D  E  F  G  H
    4  A  B  C  D  E  F  G  H
    

    0 讨论(0)
提交回复
热议问题