Python Pandas Error tokenizing data

后端 未结 30 2325
不知归路
不知归路 2020-11-22 04:49

I\'m trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 field

相关标签:
30条回答
  • 2020-11-22 05:39

    The dataset that I used had a lot of quote marks (") used extraneous of the formatting. I was able to fix the error by including this parameter for read_csv():

    quoting=3 # 3 correlates to csv.QUOTE_NONE for pandas
    
    0 讨论(0)
  • 2020-11-22 05:39

    I had a similar case as this and setting

    train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 
    

    worked

    0 讨论(0)
  • 2020-11-22 05:40

    For those who are having similar issue with Python 3 on linux OS.

    pandas.errors.ParserError: Error tokenizing data. C error: Calling
    read(nbytes) on source failed. Try engine='python'.
    

    Try:

    df.read_csv('file.csv', encoding='utf8', engine='python')
    
    0 讨论(0)
  • 2020-11-22 05:40

    I had a dataset with prexisting row numbers, I used index_col:

    pd.read_csv('train.csv', index_col=0)
    
    0 讨论(0)
  • 2020-11-22 05:41

    following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):

    df = pd.read_csv(filename, usecols=range(0, 42)) df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']

    Following does NOT work:

    df = pd.read_csv(filename, names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'], usecols=range(0, 42))

    CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54 Following does NOT work:

    df = pd.read_csv(filename, header=None)

    CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54

    Hence, in your problem you have to pass usecols=range(0, 2)

    0 讨论(0)
  • 2020-11-22 05:41

    This is what I did.

    sep='::' solved my issue:

    data=pd.read_csv('C:\\Users\\HP\\Downloads\\NPL ASSINGMENT 2 imdb_labelled\\imdb_labelled.txt',engine='python',header=None,sep='::')
    
    0 讨论(0)
提交回复
热议问题