问题
I have some data that looks like this:
c stuff
c more header
c begin data
1 1:.5
1 2:6.5
1 3:5.3
I want to import it into a 3 column data frame, with columns e.g.
a , b, c
1, 1, 0.5
etc
I have been trying to read in the data as 2 columns split on ':', and then to split the first column on ' '. However I'm finding it irksome. Is there a better way to sort it out on import directly?
currently:
data1 = pd.read_csv(file_loc, skiprows = 3, delimiter = ':', names = ['AB', 'C'])
data2 = pd.DataFrame(data1.AB.str.split(' ',1).tolist(), names = ['A','B'])
However this is further complicated by the fact my data has a leading space...
I feel like this should be a simple task, but currently I'm thinking of reading it line by line and using some find replace to sanitise the data before importing.
回答1:
One way might be to use the regex separators permitted by the python engine. For example:
>>> !cat castle.dat
c stuff
c more header
c begin data
1 1:.5
1 2:6.5
1 3:5.3
>>> df = pd.read_csv('castle.dat', skiprows=3, names=['a', 'b', 'c'],
sep=' |:', engine='python')
>>> df
a b c
0 1 1 0.5
1 1 2 6.5
2 1 3 5.3
来源:https://stackoverflow.com/questions/26551662/import-text-to-pandas-with-multiple-delimiters