rank data over a rolling window in pandas DataFrame

后端 未结 2 428
伪装坚强ぢ
伪装坚强ぢ 2020-12-20 19:16

I am new to Python and the Pandas library, so apologies if this is a trivial question. I am trying to rank a Timeseries over a rolling window of N days. I know there is a ra

相关标签:
2条回答
  • 2020-12-20 20:01

    You can write a custom function for a rolling_window in Pandas. Using numpy's argsort() in that function can give you the rank within the window:

    import pandas as pd
    import StringIO
    
    testdata = StringIO.StringIO("""
    Date,A
    01-01-2013,100
    02-01-2013,85
    03-01-2013,110
    04-01-2013,60
    05-01-2013,20
    06-01-2013,40""")
    
    df = pd.read_csv(testdata, header=True, index_col=['Date'])
    
    rollrank = lambda data: data.size - data.argsort().argsort()[-1]
    
    df['rank'] = pd.rolling_apply(df, 3, rollrank)
    
    print df
    

    results in:

                  A  rank
    Date                 
    01-01-2013  100   NaN
    02-01-2013   85   NaN
    03-01-2013  110     1
    04-01-2013   60     3
    05-01-2013   20     3
    06-01-2013   40     2
    
    0 讨论(0)
  • 2020-12-20 20:15

    If you want to use the Pandas built-in rank method (with some additional semantics, such as the ascending option), you can create a simple function wrapper for it

    def rank(array):
        s = pd.Series(array)
        return s.rank(ascending=False)[len(s)-1]
    

    that can then be used as a custom rolling-window function.

    pd.rolling_apply(df['A'], 3, rank)
    

    which outputs

    Date
    01-01-2013   NaN
    02-01-2013   NaN
    03-01-2013     1
    04-01-2013     3
    05-01-2013     3
    06-01-2013     2
    

    (I'm assuming the df data structure from Rutger's answer)

    0 讨论(0)
提交回复
热议问题