问题
I have a dataframe which looks like this.
id YearReleased Artist count
168 2015 Muse 1
169 2015 Rihanna 3
170 2015 Taylor Swift 2
171 2016 Jennifer Lopez 1
172 2016 Rihanna 3
173 2016 Underworld 1
174 2017 Coldplay 1
175 2017 Ed Sheeran 2
I want to get the maximum count for each year and then get the corresponding Artist name.
Something like this:
YearReleased Artist
2015 Rihanna
2016 Rihanna
2017 Ed Sheeran
I have tried using a loop to iterate over the rows of the dataframe and create another dictionary with key as year and value as artist. But when I try to convert that dictionary to a dataframe, the keys are mapped to columns instead of rows.
Can somebody guide me to have a better approach to this without having to loop over the dataframe and instead use some inbuilt pandas method to achieve this?
回答1:
Look at idxmax
df.loc[df.groupby('YearReleased')['count'].idxmax()]
Out[445]:
id YearReleased Artist count
1 169 2015 Rihanna 3
4 172 2016 Rihanna 3
7 175 2017 EdSheeran 2
回答2:
You can use groupby and transform :
idx = df.groupby(['YearReleased'])['count'].transform(max) == df['count']
and then use this indexer:
df[idx]
Out[14]:
id YearReleased Artist count
1 169 2015 Rihanna 3
4 172 2016 Rihanna 3
7 175 2017 Ed Sheeran 2
来源:https://stackoverflow.com/questions/49263437/how-to-get-value-of-a-column-based-on-the-maximum-of-another-column-in-case-of-d