A lot of times, I have a big dataframe df
to hold the basic data, and need to create many more columns to hold the derivative data calculated by basic data columns.
You could subclass DataFrame
, and add the column as a property. For example,
import pandas as pd
class LazyFrame(pd.DataFrame):
@property
def derivative_col1(self):
self['derivative_col1'] = result = self['basic_col1'] + self['basic_col2']
return result
x = LazyFrame({'basic_col1':[1,2,3],
'basic_col2':[4,5,6]})
print(x)
# basic_col1 basic_col2
# 0 1 4
# 1 2 5
# 2 3 6
Accessing the property (via x.derivative_col1
, below) calls the derivative_col1
function defined in LazyFrame. This function computes the result and adds the derived column to the LazyFrame instance:
print(x.derivative_col1)
# 0 5
# 1 7
# 2 9
print(x)
# basic_col1 basic_col2 derivative_col1
# 0 1 4 5
# 1 2 5 7
# 2 3 6 9
Note that if you modify a basic column:
x['basic_col1'] *= 10
the derived column is not automatically updated:
print(x['derivative_col1'])
# 0 5
# 1 7
# 2 9
But if you access the property, the values are recomputed:
print(x.derivative_col1)
# 0 14
# 1 25
# 2 36
print(x)
# basic_col1 basic_col2 derivative_col1
# 0 10 4 14
# 1 20 5 25
# 2 30 6 36