How to create lazy_evaluated dataframe columns in Pandas

后端 未结 2 1939
逝去的感伤
逝去的感伤 2021-02-05 11:16

A lot of times, I have a big dataframe df to hold the basic data, and need to create many more columns to hold the derivative data calculated by basic data columns.

2条回答
  •  星月不相逢
    2021-02-05 11:49

    You could subclass DataFrame, and add the column as a property. For example,

    import pandas as pd
    
    class LazyFrame(pd.DataFrame):
        @property
        def derivative_col1(self):
            self['derivative_col1'] = result = self['basic_col1'] + self['basic_col2']
            return result
    
    x = LazyFrame({'basic_col1':[1,2,3],
                   'basic_col2':[4,5,6]})
    print(x)
    #    basic_col1  basic_col2
    # 0           1           4
    # 1           2           5
    # 2           3           6
    

    Accessing the property (via x.derivative_col1, below) calls the derivative_col1 function defined in LazyFrame. This function computes the result and adds the derived column to the LazyFrame instance:

    print(x.derivative_col1)
    # 0    5
    # 1    7
    # 2    9
    
    print(x)
    #    basic_col1  basic_col2  derivative_col1
    # 0           1           4                5
    # 1           2           5                7
    # 2           3           6                9
    

    Note that if you modify a basic column:

    x['basic_col1'] *= 10
    

    the derived column is not automatically updated:

    print(x['derivative_col1'])
    # 0    5
    # 1    7
    # 2    9
    

    But if you access the property, the values are recomputed:

    print(x.derivative_col1)
    # 0    14
    # 1    25
    # 2    36
    
    print(x)
    #    basic_col1  basic_col2  derivative_col1
    # 0          10           4               14
    # 1          20           5               25
    # 2          30           6               36
    

提交回复
热议问题