CParserError: Error tokenizing data

家住魔仙堡 提交于 2019-12-25 07:18:04

问题


I'm having some trouble reading a csv file

import pandas as pd

df = pd.read_csv('Data_Matches_tekha.csv', skiprows=2)

I get

pandas.io.common.CParserError: Error tokenizing data. C error: Expected 1 fields in line 526, saw 5

and when I add sep=None to df I get another error

Error: line contains NULL byte

I tried adding unicode='utf-8', I even tried CSV reader and nothing works with this file

the csv file is totally fine, I checked it and i see nothing wrong with it

Here are the errors I get:


回答1:


In your actual code, the line is:

>>> pandas.read_csv("Data_Matches_tekha.xlsx", sep=None)

You are trying to read an Excel file, and not a plain text CSV which is why things are not working.

Excel files (xlsx) are in a special binary format which cannot be read as simple text files (like CSV files).

You need to either convert the Excel file to a CSV file (note - if you have multiple sheets, each sheet should be converted to its own csv file), and then read those.

You can use read_excel or you can use a library like xlrd which is designed to read the binary format of Excel files; see Reading/parsing Excel (xls) files with Python for for more information on that.




回答2:


Use read_excel instead read_csv if Excel file:

import pandas as pd

df = pd.read_excel("Data_Matches_tekha.xlsx")



回答3:


I have encountered the same error when I used to_csv to write some data and then read it in another script. I found an easy solution without passing by pandas' read function, it's a package named Pickle.

You can download it by typing in your terminal

pip install pickle 

Then you can use for writing your data (first) the code below

import pickle 

with open(path, 'wb') as output:
pickle.dump(variable_to_save, output)

And finally import your data in another script using

import pickle 

with open(path, 'rb') as input:
data = pickle.load(input)

Note that if you want to use, when reading your saved data, a different python version than the one in which you saved your data, you can precise that in the writing step by using protocol=x with x corresponding to the version (2 or 3) aiming to use for reading.

I hope this can be of any use.



来源:https://stackoverflow.com/questions/37505577/cparsererror-error-tokenizing-data

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!