Load .csv with unknown delimiter into Pandas DataFrame

主宰稳场 提交于 2019-12-10 19:12:58

问题


I have many .csv files that are to be loaded into pandas data-frames, there are at a minimum two delimiters comma and semi colon, and I am unsure of the rest of the delimiters. I understand that the delimeter can be set using

dataRaw = pd.read_csv(name,sep=",")

and

dataRaw = pd.read_csv(name,sep=";")

unfortunately if I was to not specify a delimiter the default is comma which results in a single column data frame for other delimiters. thus is there a dynamic way to allocate a delimiter so that any csv can be passed to pandas? such as try comma or semicolon. The pandas documentation doesn't allude to the use of logic in the csv read


回答1:


If you have different separators you can use:

dataRaw = pd.read_csv(name,sep=";|,")

is a Regular expression that can handle multiple separators divided by the OR (|) operator.




回答2:


There is actually an answer in pandas documentation (at least, for pandas 0.20.1)

sep : str, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used automatically. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'

This means you can read your files just with

dataRaw = pd.read_csv(name, sep = None, engine = 'python')

This should also work if there are other separators than ';' or '.' among your .csv files (for example, tab-separators).



来源:https://stackoverflow.com/questions/34359598/load-csv-with-unknown-delimiter-into-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!