Python Pandas Error tokenizing data

后端 未结 30 2246
不知归路
不知归路 2020-11-22 04:49

I\'m trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 field

30条回答
  •  抹茶落季
    2020-11-22 05:41

    The issue for me was that a new column was appended to my CSV intraday. The accepted answer solution would not work as every future row would be discarded if I used error_bad_lines=False.

    The solution in this case was to use the usecols parameter in pd.read_csv(). This way I can specify only the columns that I need to read into the CSV and my Python code will remain resilient to future CSV changes so long as a header column exists (and the column names do not change).

    usecols : list-like or callable, optional 
    
    Return a subset of the columns. If list-like, all elements must either
    be positional (i.e. integer indices into the document columns) or
    strings that correspond to column names provided either by the user in
    names or inferred from the document header row(s). For example, a
    valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar',
    'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1,
    0]. To instantiate a DataFrame from data with element order preserved
    use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for
    columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo',
    'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.
    

    Example

    my_columns = ['foo', 'bar', 'bob']
    df = pd.read_csv(file_path, usecols=my_columns)
    

    Another benefit of this is that I can load way less data into memory if I am only using 3-4 columns of a CSV that has 18-20 columns.

提交回复
热议问题