How to sort pandas data frame using values from several columns?

前端 未结 7 1125
走了就别回头了
走了就别回头了 2020-12-02 09:33

I have the following data frame:

df = pandas.DataFrame([{\'c1\':3,\'c2\':10},{\'c1\':2, \'c2\':30},{\'c1\':1,\'c2\':20},{\'c1\':2,\'c2\':15},{\'c1\':2,\'c2\'         


        
相关标签:
7条回答
  • 2020-12-02 10:10

    If you are writing this code as a script file then you will have to write it like this:

    df = df.sort(['c1','c2'], ascending=[False,True])
    
    0 讨论(0)
  • 2020-12-02 10:21

    DataFrame.sort is deprecated; use DataFrame.sort_values.

    >>> df.sort_values(['c1','c2'], ascending=[False,True])
       c1   c2
    0   3   10
    3   2   15
    1   2   30
    4   2  100
    2   1   20
    >>> df.sort(['c1','c2'], ascending=[False,True])
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/ampawake/anaconda/envs/pseudo/lib/python2.7/site-packages/pandas/core/generic.py", line 3614, in __getattr__
        return object.__getattribute__(self, name)
    AttributeError: 'DataFrame' object has no attribute 'sort'
    
    0 讨论(0)
  • 2020-12-02 10:24

    Use of sort can result in warning message. See github discussion. So you might wanna use sort_values, docs here

    Then your code can look like this:

    df = df.sort_values(by=['c1','c2'], ascending=[False,True])
    
    0 讨论(0)
  • 2020-12-02 10:29

    In my case, the accepted answer didn't work:

    f.sort_values(by=["c1","c2"], ascending=[False, True])

    Only the following worked as expected:

    f = f.sort_values(by=["c1","c2"], ascending=[False, True])
    
    0 讨论(0)
  • 2020-12-02 10:29

    I have found this to be really useful:

    df = pd.DataFrame({'A' : range(0,10) * 2, 'B' : np.random.randint(20,30,20)})
    
    # A ascending, B descending
    df.sort(**skw(columns=['A','-B']))
    
    # A descending, B ascending
    df.sort(**skw(columns=['-A','+B']))
    

    Note that unlike the standard columns=,ascending= arguments, here column names and their sort order are in the same place. As a result your code gets a lot easier to read and maintain.

    Note the actual call to .sort is unchanged, skw (sortkwargs) is just a small helper function that parses the columns and returns the usual columns= and ascending= parameters for you. Pass it any other sort kwargs as you usually would. Copy/paste the following code into e.g. your local utils.py then forget about it and just use it as above.

    # utils.py (or anywhere else convenient to import)
    def skw(columns=None, **kwargs):
        """ get sort kwargs by parsing sort order given in column name """
        # set default order as ascending (+)
        sort_cols = ['+' + col if col[0] != '-' else col for col in columns]
        # get sort kwargs
        columns, ascending = zip(*[(col.replace('+', '').replace('-', ''), 
                                    False if col[0] == '-' else True) 
                                   for col in sort_cols])
        kwargs.update(dict(columns=list(columns), ascending=ascending))
        return kwargs
    
    0 讨论(0)
  • 2020-12-02 10:30

    The dataframe.sort() method is - so my understanding - deprecated in pandas > 0.18. In order to solve your problem you should use dataframe.sort_values() instead:

    f.sort_values(by=["c1","c2"], ascending=[False, True])
    

    The output looks like this:

        c1  c2
        3   10
        2   15
        2   30
        2   100
        1   20
    
    0 讨论(0)
提交回复
热议问题