How to constuct a column of data frame recursively with pandas-python?

后端 未结 3 1568
醉酒成梦
醉酒成梦 2021-01-05 22:15

Give such a data frame df:

id_      val     
11111    12
12003    22
88763    19
43721    77
...

I wish to add a column

相关标签:
3条回答
  • 2021-01-05 22:37

    Recursive functions are not easily vectorisable. However, you can optimize your algorithm with numba. This should be preferable to a regular loop.

    from numba import jit
    
    @jit(nopython=True)
    def foo(val):
        diff = np.zeros(val.shape)
        diff[0] = val[0] * 0.4
        for i in range(1, diff.shape[0]):
            diff[i] = (val[i] - diff[i-1]) * 0.4 + diff[i-1]
        return diff
    
    df['diff'] = foo(df['val'].values)
    
    print(df)
    
         id_  val     diff
    0  11111   12   4.8000
    1  12003   22  11.6800
    2  88763   19  14.6080
    3  43721   77  39.5648
    
    0 讨论(0)
  • 2021-01-05 22:48

    You can use:

    df.loc[0, 'diff'] = df.loc[0, 'val'] * 0.4
    
    for i in range(1, len(df)):
        df.loc[i, 'diff'] = (df.loc[i, 'val'] - df.loc[i-1, 'diff']) * 0.4  + df.loc[i-1, 'diff']
    
    print (df)
         id_  val     diff
    0  11111   12   4.8000
    1  12003   22  11.6800
    2  88763   19  14.6080
    3  43721   77  39.5648
    

    The iterative nature of the calculation where the inputs depend on results of previous steps complicates vectorization. You could perhaps use apply with a function that does the same calculation as the loop, but behind the scenes this would also be a loop.

    0 讨论(0)
  • 2021-01-05 22:54

    if you are using apply in pandas, you should not be using the dataframe again within the lambda function.

    your object in all cases within the lambda function should be 'row'.

    0 讨论(0)
提交回复
热议问题