问题
I am trying to fill a dataframe (elist) with cumulative returns of firms and cumulative market returns. This can be done by looping through the elist dataframe using iterrows
, see this link. However, this is slow.
I am looking for a more potent, faster solution.
The raw returns that are the inputs for the cumulative return calculations originate from two dataframes (ri, rm). The results should then be recorded in columns of elist. See example below, using the data in this file.
Before running the iterrows
loop, elist looks like:
permno begdat enddat return vwretd
0 11628 2012-03-31 2013-03-31 NaN NaN
1 11628 2012-06-30 2013-06-30 NaN NaN
2 11628 2012-09-30 2013-09-30 NaN NaN
3 11628 2012-12-31 2013-12-31 NaN NaN
4 11628 2013-03-31 2014-03-31 NaN NaN
After running the loop elist should look like:
permno begdat enddat return vwretd
0 11628 2012-03-31 2013-03-31 0.212355 0.133429
1 11628 2012-06-30 2013-06-30 0.274788 0.198380
2 11628 2012-09-30 2013-09-30 0.243590 0.198079
3 11628 2012-12-31 2013-12-31 0.299277 0.304479
4 11628 2013-03-31 2014-03-31 0.303147 0.208454
Here is the code that relies on iterrows, which is slow:
import os,sys
import pandas as pd
import numpy as np
rm = pd.read_csv('rm_so.csv') # market return
ri = pd.read_csv('ri_so.csv') # firm return
elist = pd.read_csv('elist_so.csv') # table to be filled with cumlative returns over a period (begdat to enddat)
for index, row in elist.iterrows():
#fill cumulative market return
elist.loc[index, 'vwretd']=rm.loc[(rm['date']>row['begdat']) & (rm['date']<=row['enddat']),'vwretd'].product()-1
#fill cumulative firm return
r = ri.loc[(ri['permno']==row['permno']),]
elist.loc[index, 'return'] = r.loc[(r['date']>row['begdat']) & (r['date']<=row['enddat']),'ret'].product()-1
It would be great to see this process running faster!
来源:https://stackoverflow.com/questions/42601765/how-to-efficiently-fill-a-column-of-dataframe-with-values-calculated-from-other