问题
I've a dataframe which looks like this
some feature another feature label
sample
0 ... ... ...
and I'd like to get a dataframe with multiindexed columns like this
features label
sample some another
0 ... ... ...
From the API it's not clear to me how to use from_arrays()
, from_product()
, from_tuples()
or from_frame()
correctly. The solution shall not depend on string parsing of the feature columns (some feature
, another feature
). The last column for the label is the last column and it's column name label
may be used. How can I get want I want?
回答1:
From the API it's not clear to me how to use
from_arrays()
,from_product()
,from_tuples()
orfrom_frame()
correctly.
It is mainly used, if generate new DataFrame with MultiIndex independent of original columns names.
So it means if need completely new MultiIndex
, e.g. by lists or arrays:
a = ['a','a','b']
b = ['x','y','z']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
a b
x y z
sample
0 2 3 5
1 4 5 7
EDIT1: If want set all columns to MultiIndex
all columns same way without last one:
a = ['parent'] * (len(df.columns) - 1) + ['label']
b = df.columns[:-1].tolist() + ['val']
df.columns = pd.MultiIndex.from_arrays([a,b])
print (df)
parent label
feature a feature b val
sample
0 2 3 5
1 4 5 7
It is possible by split
, but if some column(s) without separator get NaN
s for second level, because is not possible combinations MultiIndex and not MultiIndex columns (actaully yes, but get tuples from MultiIndex columns):
print (df)
feature_a feature_b label
sample
0 2 3 5
1 4 5 7
df.columns = df.columns.str.split(expand=True)
print (df)
feature label
a b NaN
sample
0 2 3 5
1 4 5 7
So better is convert all columns without separator to Index/MultiIndex
first by DataFrame.set_index:
df = df.set_index('label')
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
label
5 2 3
7 4 5
For prevent original index is used append=True
parameter:
df = df.set_index('label', append=True)
df.columns = df.columns.str.split(expand=True)
print (df)
feature
a b
sample label
0 5 2 3
1 7 4 5
来源:https://stackoverflow.com/questions/61229699/how-can-i-summarize-several-pandas-dataframe-columns-into-a-parent-column-name