Reading csv from pandas having both quotechar and delimiter for a column value

倾然丶 夕夏残阳落幕 提交于 2019-12-31 03:08:08

问题


Here is the content of a csv file 'test.csv', i am trying to read it via pandas read_csv()

"col1", "col2", "col3", "col4"
"v1", "v2", "v3", "v4"
"v21", "v22", "v23", "this, "creating, what to do? " problems"

This is the command i am using -

messages = pd.read_csv('test.csv', sep=',', skipinitialspace=True)

But i am getting the following error -

CParserError: Error tokenizing data. C error: Expected 4 fields in line 3, saw 5

i want the content for column4 in line3 to be 'this, "creating, what to do? " problems'

How to read file when a column can have quotechar and delimiter included in it ?


回答1:


pandas does not allow you to keep malformed rows and to be honest I don't really see a way of ignoring some " characters but not others in your example. I think your intuition of using '", "' as the delimiter and then doing a cleanup is the best approach. If you're really worried about doing this in one line:

message = pd.read_csv('test.txt', sep='", "', names = ['col1','col2','col3','col4'], skiprows=1).apply(lambda x: x.str.strip('"'))

which handles stripping quotes in the column names as well and gives you:

>>> message
>>> 
  col1 col2 col3                                     col4
0   v1   v2   v3                                       v4
1  v21  v22  v23  this, "creating, what to do? " problems


来源:https://stackoverflow.com/questions/35686920/reading-csv-from-pandas-having-both-quotechar-and-delimiter-for-a-column-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!