DataFrame: N largest indexes values (from level=1) to n columns

和自甴很熟 提交于 2019-12-13 00:28:59

问题


I am trying to convert such a df:

df = pd.DataFrame({'A': ['A1', 'A1', 'A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A2', 'A2', 'A2'],
      'B': ['B1', 'B1', 'B2', 'B2', 'B3', 'B3', 'B4', 'B5', 'B6', 'B7', 'B7', 'B8', 'B8']})

by taking n (here 2) largest indexes (by count of B) to:

My way of doing it:

df = df.groupby(['A', 'B'])['A'].count()
df = df.groupby(level=0).nlargest(2).reset_index(level=0, drop=True)

what gives me (which is close to what I need):

Now, the only methods I know to transform MultiIndex are:

df.reset_index(level=1)
df.unstack()

But they don't give me what I am looking for. Is there any dataframe method that will do it for me or I need to do it around with apply. One way of doing it would be to loop through every pair of: df.index.get_level_values(level=1) and putting it to new df of 2 columns. But this will break If one index.level=0, will have only one index.level=1

Also: I don't care for order of (nlargest) when the count is the same.


回答1:


Use SeriesGroupBy.value_counts which by default sort with select top 2 index values by head and then DataFrame contructor:

a = df.groupby('A')['B'].apply(lambda x: x.value_counts().head(2).index.tolist())
print (a)
A
A1    [B1, B3]
A2    [B7, B8]
Name: B, dtype: object

If want use your code:

df = df.groupby(['A', 'B'])['A'].count()
df = df.groupby(level=0).nlargest(2).reset_index(level=0, drop=True)

df = df.rename('C').reset_index().groupby('A')['B'].apply(list)
print (df)
A
A1    [B1, B2]
A2    [B7, B8]
Name: B, dtype: object

df1 = (pd.DataFrame(a.values.tolist(), index=a.index)
         .rename(columns=lambda x: x+1)
         .add_suffix('_nlargest'))
print (df1)
   1_nlargest 2_nlargest
A                       
A1         B1         B3
A2         B7         B8



回答2:


While @jezrael answer is much faster and easier (I will use it), this is what I developed, when I was working on it:

df = pd.DataFrame({'A': ['A1', 'A1', 'A1', 'A1', 'A1', 'A1', 'A2', 'A2', 'A2', 'A2', 'A2', 'A2', 'A2'],
      'B': ['B1', 'B1', 'B2', 'B2', 'B3', 'B3', 'B4', 'B5', 'B6', 'B7', 'B7', 'B8', 'B8']})

df = df.groupby(['A', 'B'])['A'].count()
df = df.groupby(level=0).nlargest(2).reset_index(level=0, drop=True)
df = df.unstack()

df_new = pd.DataFrame(columns=['A', '1_Largest', '2_largest'])

for i, row in enumerate(['A1', 'A2']):
    df_new.loc[i, :] = row
    df_new.loc[i, '1_Largest'] = df.loc[row].sort_values(ascending=False).index[0]
    df_new.loc[i, '2_largest'] = df.loc[row].sort_values(ascending=False).index[1]

df_new.set_index('A')


来源:https://stackoverflow.com/questions/50368090/dataframe-n-largest-indexes-values-from-level-1-to-n-columns

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!