Reference values in the previous row with map or apply

后端 未结 2 1174
别那么骄傲
别那么骄傲 2021-01-23 09:12

Given a dataframe df, I would like to generate a new variable/column for each row based on the values in the previous row. df is sorted so that the ord

相关标签:
2条回答
  • 2021-01-23 09:41

    If you just want to do a calculation based on the previous row, you can calculate and then shift:

    In [2]: df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
    
    In [3]: df
    Out[3]:
       a   b
    0  0   0
    1  1  10
    2  2  20
    
    # a calculation based on other column
    In [4]: df['c'] = df['b'] + 1
    
    # shift the column
    In [5]: df['c'] = df['c'].shift()
    
    In [6]: df
    Out[6]:
       a   b   c
    0  0   0 NaN
    1  1  10   1
    2  2  20  11
    

    If you want to do a calculation based on multiple rows, you could look at the rolling_apply function (http://pandas.pydata.org/pandas-docs/stable/computation.html#moving-rolling-statistics-moments and http://pandas.pydata.org/pandas-docs/stable/generated/pandas.rolling_apply.html#pandas.rolling_apply)

    0 讨论(0)
  • 2021-01-23 09:49

    You can use the dataframe 'apply' function and leverage the unused the 'kwargs' parameter to store the previous row.

    import pandas as pd
    
    df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
    
    new_col = 'c'
    
    def apply_func_decorator(func):
        prev_row = {}
        def wrapper(curr_row, **kwargs):
            val = func(curr_row, prev_row)
            prev_row.update(curr_row)
            prev_row[new_col] = val
            return val
        return wrapper
    
    @apply_func_decorator
    def running_total(curr_row, prev_row):
        return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
    
    df[new_col] = df.apply(running_total, axis=1)
    
    print(df)
    # Output will be:
    #    a   b   c
    # 0  0   0   0
    # 1  1  10  11
    # 2  2  20  33
    

    This example uses a decorator to store the previous row in a dictionary and then pass it to the function when Pandas calls it on the next row.

    Disclaimer 1: The 'prev_row' variable starts off empty for the first row so when using it in the apply function I had to supply a default value to avoid a 'KeyError'.

    Disclaimer 2: I am fairly certain this will be slower the apply operation but I did not do any tests to figure out how much.

    0 讨论(0)
提交回复
热议问题