I want to apply a custom function and create a derived column called population2050 that is based on two columns already present in my data frame.
import pandas
You were almost there:
facts['pop2050'] = facts.apply(lambda row: final_pop(row['population'],row['population_growth']),axis=1)
Using lambda allows you to keep the specific (interesting) parameters listed in your function, rather than bundling them in a 'row'.
You can achieve the same result without the need for DataFrame.apply()
. Pandas series (or dataframe columns) can be used as direct arguments for NumPy functions and even built-in Python operators, which are applied element-wise. In your case, it is as simple as the following:
import numpy as np
facts['pop2050'] = facts['population'] * np.exp(35 * facts['population_growth'])
This multiplies each element in the column population_growth
, applies numpy's exp()
function to that new column (35 * population_growth
) and then adds the result with population
.
Your function,
def function(x):
// your operation
return x
call your function as,
df['column']=df['column'].apply(function)
Apply will pass you along the entire row with axis=1. Adjust like this assuming your two columns are called initial_pop
and growth_rate
def final_pop(row):
return row.initial_pop*math.e**(row.growth_rate*35)