I\'m trying to use pandas to manipulate a .csv file but I get this error:
pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 field
Simple resolution: Open the csv file in excel & save it with different name file of csv format. Again try importing it spyder, Your problem will be resolved!
I've had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by "properly", I mean each row had the same number of separators or columns.
Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.
Any file saved with pandas to_csv will be properly formatted and shouldn't have that issue. But if you open it with another program, it may change the structure.
Hope that helps.
Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg
compression
resolved my problem.
result = pandas.read_csv(data_source, compression='gzip')
I came across the same issue. Using pd.read_table()
on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.
Edit: I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn't work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.
I usually get around this by reading the extra data into a file then use the read_csv() method.
The exact solution might differ depending on your actual file, but this approach has worked for me in several cases
In my case the separator was not the default "," but Tab.
pd.read_csv(file_name.csv, sep='\\t',lineterminator='\\r', engine='python', header='infer')
Note: "\t" did not work as suggested by some sources. "\\t" was required.
An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:
import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)
#once contents are available, I then put them in a list
csv_list = []
for l in reader:
csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)
I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.