问题
How can I melt a pandas data frame using multiple variable names and values? I have the following data frame that changes its shape in a for loop. In one of the for loop iterations, it looks like this:
ID Cat Class_A Class_B Prob_A Prob_B
1 Veg 1 2 0.9 0.1
2 Veg 1 2 0.8 0.2
3 Meat 1 2 0.6 0.4
4 Meat 1 2 0.3 0.7
5 Veg 1 2 0.2 0.8
I need to melt it in such a way that it looks like this:
ID Cat Class Prob
1 Veg 1 0.9
1 Veg 2 0.1
2 Veg 1 0.8
2 Veg 2 0.2
3 Meat 1 0.6
3 Meat 2 0.4
4 Meat 1 0.3
4 Meat 2 0.7
5 Veg 1 0.2
5 Veg 2 0.8
During the for loop the data frame will contain different number of classes with their probabilities. That is why I am looking for a general approach that is applicable in all my for loop iterations. I saw this question and this but they were not helpful!
回答1:
You need lreshape by dict
for specify categories:
d = {'Class':['Class_A', 'Class_B'], 'Prob':['Prob_A','Prob_B']}
df = pd.lreshape(df,d)
print (df)
Cat ID Class Prob
0 Veg 1 1 0.9
1 Veg 2 1 0.8
2 Meat 3 1 0.6
3 Meat 4 1 0.3
4 Veg 5 1 0.2
5 Veg 1 2 0.1
6 Veg 2 2 0.2
7 Meat 3 2 0.4
8 Meat 4 2 0.7
9 Veg 5 2 0.8
More dynamic solution:
Class = [col for col in df.columns if col.startswith('Class')]
Prob = [col for col in df.columns if col.startswith('Prob')]
df = pd.lreshape(df, {'Class':Class, 'Prob':Prob})
print (df)
Cat ID Class Prob
0 Veg 1 1 0.9
1 Veg 2 1 0.8
2 Meat 3 1 0.6
3 Meat 4 1 0.3
4 Veg 5 1 0.2
5 Veg 1 2 0.1
6 Veg 2 2 0.2
7 Meat 3 2 0.4
8 Meat 4 2 0.7
9 Veg 5 2 0.8
EDIT:
lreshape is now undocumented, but is possible in future will by removed (with pd.wide_to_long too).
Possible solution is merging all 3 functions to one - maybe melt
, but now it is not implementated. Maybe in some new version of pandas. Then my answer will be updated.
回答2:
Or you can try this by using str.contain
and pd.concat
DF1=df2.loc[:,df2.columns.str.contains('_A|Cat|ID')]
name=['ID','Cat','Class','Prob']
DF1.columns=name
DF2=df2.loc[:,df2.columns.str.contains('_B|Cat|ID')]
DF2.columns=name
pd.concat([DF1,DF2],axis=0)
Out[354]:
ID Cat Class Prob
0 1 Veg 1 0.9
1 2 Veg 1 0.8
2 3 Meat 1 0.6
3 4 Meat 1 0.3
4 5 Veg 1 0.2
0 1 Veg 2 0.1
1 2 Veg 2 0.2
2 3 Meat 2 0.4
3 4 Meat 2 0.7
4 5 Veg 2 0.8
回答3:
The top voted answer uses the undocumented lreshape
which may at some point get deprecated because of its similarity to pd.wide_to_long which is documented and can use directly here. By default suffix
matches only to numbers. You must change this to match characters (here I just used any character).
pd.wide_to_long(df, stubnames=['Class', 'Prob'], i=['ID', 'Cat'], j='DROPME', suffix='.')\
.reset_index()\
.drop('DROPME', axis=1)
ID Cat Class Prob
0 1 Veg 1 0.9
1 1 Veg 2 0.1
2 2 Veg 1 0.8
3 2 Veg 2 0.2
4 3 Meat 1 0.6
5 3 Meat 2 0.4
6 4 Meat 1 0.3
7 4 Meat 2 0.7
8 5 Veg 1 0.2
9 5 Veg 2 0.8
回答4:
You could also use pd.melt.
# Make DataFrame
df = pd.DataFrame({'ID' : [i for i in range(1,6)],
'Cat' : ['Veg']*2 + ['Meat']*2 + ['Veg'],
'Class_A' : [1]*5,
'Class_B' : [2]*5,
'Prob_A' : [0.9, 0.8, 0.6, 0.3, 0.2],
'Prob_B' : [0.1, 0.2, 0.4, 0.7, 0.8]})
# Make class dataframe and prob dataframe
df_class = df.loc[:, ['ID', 'Cat', 'Class_A', 'Class_B']]
df_prob = df.loc[:, ['ID', 'Cat', 'Prob_A', 'Prob_B']]
# Melt class dataframe and prob dataframe
df_class = df_class.melt(id_vars = ['ID',
'Cat'],
value_vars = ['Class_A',
'Class_B'],
value_name = 'Class')
df_prob = df_prob.melt(id_vars = ['ID',
'Cat'],
value_vars = ['Prob_A',
'Prob_B'],
value_name = 'Prob')
# Clean variable column so only 'A','B' is left in both dataframes
df_class.loc[:, 'variable'] = df_class.loc[:, 'variable'].str.partition('_')[2]
df_prob.loc[:, 'variable'] = df_prob.loc[:, 'variable'].str.partition('_')[2]
# Merge class dataframe with prob dataframe on 'ID', 'Cat', and 'variable';
# drop 'variable'; sort values by 'ID', 'Cat'
final = df_class.merge(df_prob,
how = 'inner',
on = ['ID',
'Cat',
'variable']).drop('variable', axis = 1).sort_values(by = ['ID',
'Cat'])
来源:https://stackoverflow.com/questions/45662403/melting-pandas-data-frame-with-multiple-variable-names-and-multiple-value-names