I have a pandas dataframe defined as follows:
import pandas as pd
headers = [\'Group\', \'Element\', \'Case\', \'Score\', \'Evaluation\'
Starting from the result
data frame, you can transform in two steps as follows to the format you need:
# collapse multi index column to single level column
result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]
# split the idxmax column into two columns
result = result.assign(
max_score_element = result.idxmax_Score.str[0],
max_score_case = result.idxmax_Score.str[1]
).drop('idxmax_Score', 1)
result
#Group max_Score min_Evaluation max_score_case max_score_element
#0 A 9.19 0.41 y 1
#1 B 9.12 0.10 x 2
An alternative starting from original df
using join
, which may not be as efficient but less verbose similar to @tarashypka's idea:
(df.groupby('Group')
.agg({'Score': 'idxmax', 'Evaluation': 'min'})
.set_index('Score')
.join(df.drop('Evaluation',1))
.reset_index(drop=True))
#Evaluation Group Element Case Score
#0 0.41 A 1 y 9.19
#1 0.10 B 2 x 9.12
Naive timing with the example data set:
%%timeit
(df.groupby('Group')
.agg({'Score': 'idxmax', 'Evaluation': 'min'})
.set_index('Score')
.join(df.drop('Evaluation',1))
.reset_index(drop=True))
# 100 loops, best of 3: 3.47 ms per loop
%%timeit
result = (
df.set_index(['Element', 'Case'])
.groupby('Group')
.agg({'Score': ['max', 'idxmax'], 'Evaluation': 'min'})
.reset_index()
)
result.columns = [y + '_' + x if y != '' else x for x, y in result.columns]
result = result.assign(
max_score_element = result.idxmax_Score.str[0],
max_score_case = result.idxmax_Score.str[1]
).drop('idxmax_Score', 1)
# 100 loops, best of 3: 7.61 ms per loop