pandas groupby sort descending order

前端 未结 6 1397
离开以前
离开以前 2020-12-28 13:02

pandas groupby will by default sort. But I\'d like to change the sort order. How can I do this?

I\'m guessing that I can\'t apply a sort method to the returned gro

相关标签:
6条回答
  • 2020-12-28 13:06

    Other instance of preserving the order or sort by descending:

    In [97]: import pandas as pd                                                                                                    
    
    In [98]: df = pd.DataFrame({'name':['A','B','C','A','B','C','A','B','C'],'Year':[2003,2002,2001,2003,2002,2001,2003,2002,2001]})
    
    #### Default groupby operation:
    In [99]: for each in df.groupby(["Year"]): print each                                                                           
    (2001,    Year name
    2  2001    C
    5  2001    C
    8  2001    C)
    (2002,    Year name
    1  2002    B
    4  2002    B
    7  2002    B)
    (2003,    Year name
    0  2003    A
    3  2003    A
    6  2003    A)
    
    ### order preserved:
    In [100]: for each in df.groupby(["Year"], sort=False): print each                                                               
    (2003,    Year name
    0  2003    A
    3  2003    A
    6  2003    A)
    (2002,    Year name
    1  2002    B
    4  2002    B
    7  2002    B)
    (2001,    Year name
    2  2001    C
    5  2001    C
    8  2001    C)
    
    In [106]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"]))                        
    Out[106]: 
            Year name
    Year             
    2003 0  2003    A
         3  2003    A
         6  2003    A
    2002 1  2002    B
         4  2002    B
         7  2002    B
    2001 2  2001    C
         5  2001    C
         8  2001    C
    
    In [107]: df.groupby(["Year"], sort=False).apply(lambda x: x.sort_values(["Year"])).reset_index(drop=True)
    Out[107]: 
       Year name
    0  2003    A
    1  2003    A
    2  2003    A
    3  2002    B
    4  2002    B
    5  2002    B
    6  2001    C
    7  2001    C
    8  2001    C
    
    0 讨论(0)
  • 2020-12-28 13:14

    As of Pandas 0.18 one way to do this is to use the sort_index method of the grouped data.

    Here's an example:

    np.random.seed(1)
    n=10
    df = pd.DataFrame({'mygroups' : np.random.choice(['dogs','cats','cows','chickens'], size=n), 
                       'data' : np.random.randint(1000, size=n)})
    
    grouped = df.groupby('mygroups', sort=False).sum()
    grouped.sort_index(ascending=False)
    print grouped
    
    data
    mygroups      
    dogs      1831
    chickens  1446
    cats       933
    

    As you can see, the groupby column is sorted descending now, indstead of the default which is ascending.

    0 讨论(0)
  • 2020-12-28 13:20

    Do your groupby, and use reset_index() to make it back into a DataFrame. Then sort.

    grouped = df.groupby('mygroups').sum().reset_index()
    grouped.sort_values('mygroups', ascending=False)
    
    0 讨论(0)
  • 2020-12-28 13:25

    Similar to one of the answers above, but try adding .sort_values() to your .groupby() will allow you to change the sort order. If you need to sort on a single column, it would look like this:

    df.groupby('group')['id'].count().sort_values(ascending=False)
    

    ascending=False will sort from high to low, the default is to sort from low to high.

    *Careful with some of these aggregations. For example .size() and .count() return different values since .size() counts NaNs.

    What is the difference between size and count in pandas?

    0 讨论(0)
  • 2020-12-28 13:26

    You can do a sort_values() on the dataframe before you do the groupby. Pandas preserves the ordering in the groupby.

    In [44]: d.head(10)
    Out[44]:
                  name transcript  exon
    0  ENST00000456328          2     1
    1  ENST00000450305          2     1
    2  ENST00000450305          2     2
    3  ENST00000450305          2     3
    4  ENST00000456328          2     2
    5  ENST00000450305          2     4
    6  ENST00000450305          2     5
    7  ENST00000456328          2     3
    8  ENST00000450305          2     6
    9  ENST00000488147          1    11
    
    for _, a in d.head(10).sort_values(["transcript", "exon"]).groupby(["name", "transcript"]): print(a)
                  name transcript  exon
    1  ENST00000450305          2     1
    2  ENST00000450305          2     2
    3  ENST00000450305          2     3
    5  ENST00000450305          2     4
    6  ENST00000450305          2     5
    8  ENST00000450305          2     6
                  name transcript  exon
    0  ENST00000456328          2     1
    4  ENST00000456328          2     2
    7  ENST00000456328          2     3
                  name transcript  exon
    9  ENST00000488147          1    11
    
    0 讨论(0)
  • 2020-12-28 13:28

    This kind of operation is covered under hierarchical indexing. Check out the examples here

    When you groupby, you're making new indices. If you also pass a list through .agg(). you'll get multiple columns. I was trying to figure this out and found this thread via google.

    It turns out if you pass a tuple corresponding to the exact column you want sorted on.

    Try this:

    # generate toy data 
    ex = pd.DataFrame(np.random.randint(1,10,size=(100,3)), columns=['features', 'AUC', 'recall'])
    
    # pass a tuple corresponding to which specific col you want sorted. In this case, 'mean' or 'AUC' alone are not unique. 
    ex.groupby('features').agg(['mean','std']).sort_values(('AUC', 'mean'))
    

    This will output a df sorted by the AUC-mean column only.

    0 讨论(0)
提交回复
热议问题