Python Pandas Error tokenizing data

后端 未结 30 2252
不知归路
不知归路 2020-11-22 04:49

I\'m trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 field

30条回答
  •  无人及你
    2020-11-22 05:45

    I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:

    data = pd.read_csv('file1.csv', error_bad_lines=False)
    

    If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:

    line     = []
    expected = []
    saw      = []     
    cont     = True 
    
    while cont == True:     
        try:
            data = pd.read_csv('file1.csv',skiprows=line)
            cont = False
        except Exception as e:    
            errortype = e.message.split('.')[0].strip()                                
            if errortype == 'Error tokenizing data':                        
               cerror      = e.message.split(':')[1].strip().replace(',','')
               nums        = [n for n in cerror.split(' ') if str.isdigit(n)]
               expected.append(int(nums[0]))
               saw.append(int(nums[2]))
               line.append(int(nums[1])-1)
             else:
               cerror      = 'Unknown'
               print 'Unknown Error - 222'
    
    if line != []:
        # Handle the errors however you want
    

    I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable 'line' in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.

提交回复
热议问题