Load .csv with unknown delimiter into Pandas DataFrame

问题

I have many .csv files that are to be loaded into pandas data-frames, there are at a minimum two delimiters comma and semi colon, and I am unsure of the rest of the delimiters. I understand that the delimeter can be set using

dataRaw = pd.read_csv(name,sep=",")

and

dataRaw = pd.read_csv(name,sep=";")

unfortunately if I was to not specify a delimiter the default is comma which results in a single column data frame for other delimiters. thus is there a dynamic way to allocate a delimiter so that any csv can be passed to pandas? such as try comma or semicolon. The pandas documentation doesn't allude to the use of logic in the csv read

回答1:

If you have different separators you can use:

dataRaw = pd.read_csv(name,sep=";|,")

is a Regular expression that can handle multiple separators divided by the OR (|) operator.

回答2:

There is actually an answer in pandas documentation (at least, for pandas 0.20.1)

sep : str, default ‘,’

Delimiter to use. If sep is None, the C engine cannot automatically detect the separator, but the Python parsing engine can, meaning the latter will be used automatically. In addition, separators longer than 1 character and different from '\s+' will be interpreted as regular expressions and will also force the use of the Python parsing engine. Note that regex delimiters are prone to ignoring quoted data. Regex example: '\r\t'

This means you can read your files just with

dataRaw = pd.read_csv(name, sep = None, engine = 'python')

This should also work if there are other separators than ';' or '.' among your .csv files (for example, tab-separators).

来源：https://stackoverflow.com/questions/34359598/load-csv-with-unknown-delimiter-into-pandas-dataframe

标签

python

csv

pandas

delimiter