Reference previous row when iterating through dataframe

后端 未结 3 871
夕颜
夕颜 2021-01-25 11:13

Is there a simple way to reference the previous row when iterating through a dataframe? In the following dataframe I would like column B to change to 1 when A > 1

相关标签:
3条回答
  • 2021-01-25 11:34

    Similar question here: Reference values in the previous row with map or apply .
    My impression is that pandas should handle iterations and we shouldn't have to do it on our own... Therefore, I chose to use the DataFrame 'apply' method.

    Here is the same answer I posted on other question linked above...

    You can use the dataframe 'apply' function and leverage the unused the 'kwargs' parameter to store the previous row.

    import pandas as pd
    
    df = pd.DataFrame({'a':[0,1,2], 'b':[0,10,20]})
    
    new_col = 'c'
    
    def apply_func_decorator(func):
        prev_row = {}
        def wrapper(curr_row, **kwargs):
            val = func(curr_row, prev_row)
            prev_row.update(curr_row)
            prev_row[new_col] = val
            return val
        return wrapper
    
    @apply_func_decorator
    def running_total(curr_row, prev_row):
        return curr_row['a'] + curr_row['b'] + prev_row.get('c', 0)
    
    df[new_col] = df.apply(running_total, axis=1)
    
    print(df)
    # Output will be:
    #    a   b   c
    # 0  0   0   0
    # 1  1  10  11
    # 2  2  20  33
    

    This example uses a decorator to store the previous row in a dictionary and then pass it to the function when Pandas calls it on the next row.

    Disclaimer 1: The 'prev_row' variable starts off empty for the first row so when using it in the apply function I had to supply a default value to avoid a 'KeyError'.

    Disclaimer 2: I am fairly certain this will be slower the apply operation but I did not do any tests to figure out how much.

    0 讨论(0)
  • 2021-01-25 11:38

    Try this: If the first value is neither >= 1 or < -1 set to 0 or whatever you like.

    df["B"] = None
    df["B"] = np.where(df['A'] >= 1, 1,df['B'])
    df["B"] = np.where(df['A'] < -1, -1,df['B'])
    df = df.ffill().fillna(0)
    

    This solves the problem stated, But the real solution to reference previous row is use .shift() or .index() -1

    0 讨论(0)
  • 2021-01-25 11:40

    This is what you are trying to do?

    In [38]: df = DataFrame(randn(10,2),columns=list('AB'))
    
    In [39]: df['B'] = np.nan
    
    In [40]: df.loc[df.A<-1,'B'] = -1
    
    In [41]: df.loc[df.A>1,'B'] = 1
    
    In [42]: df.ffill()
    Out[42]: 
              A  B
    0 -1.186808 -1
    1 -0.095587 -1
    2 -1.921372 -1
    3 -0.772836 -1
    4  0.016883 -1
    5  0.350778 -1
    6  0.165055 -1
    7  1.101561  1
    8 -0.346786  1
    9 -0.186263  1
    
    0 讨论(0)
提交回复
热议问题