问题
I'm using python+pandas to process a csv file.
The csv file has multiple headers, like
Header1 Header2
Date Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2
And in raw text format, the csv file content looks like
,Header1,,Header2,,...
Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2,...
...
My question is,
Does Pandas support this sub-header format? If not, is there a way to read this csv into pandas dataframe and do some calculation on it?
(The calculation is like extracting Header1's Subheader1-2 column, calculate average and STD, and plot everything using matplotlib.)
回答1:
Use parameter header=[0,1]
, but then next processing is necessary - replace Unnamed
columns to NaN
and then by forward filling:
import pandas as pd
temp=u''',Header1,,Header2,
Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2
2018-01-02,10,2,5,6'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=[0,1])
print (df)
Unnamed: 0_level_0 Header1 Unnamed: 2_level_0 Header2 \
Date Subheader1-1 Subheader1-2 Subheader2-1
0 2018-01-02 10 2 5
Unnamed: 4_level_0
Subheader2-2
0 6
a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).ffill().fillna('')
df.columns = [b, df.columns.get_level_values(1)]
print (df)
Header1 Header2
Date Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2
0 2018-01-02 10 2 5 6
Another better solution is create index by first column:
import pandas as pd
temp=u''',Header1,,Header2,
Date,Subheader1-1,Subheader1-2,Subheader2-1,Subheader2-2
2018-01-02,10,2,5,6'''
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=[0,1], index_col=[0])
print (df)
Header1 Unnamed: 2_level_0 Header2 Unnamed: 4_level_0
Date Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2
2018-01-02 10 2 5 6
a = df.columns.get_level_values(0).to_series()
b = a.mask(a.str.startswith('Unnamed')).ffill().fillna('')
df.columns = [b, df.columns.get_level_values(1)]
print (df)
Header1 Header2
Date Subheader1-1 Subheader1-2 Subheader2-1 Subheader2-2
2018-01-02 10 2 5 6
来源:https://stackoverflow.com/questions/51871136/pandas-how-to-read-sub-headers