问题
I have a dataframe with monthly financial data:
In [89]: vfiax_monthly.head()
Out[89]:
year month day d open close high low volume aclose
2003-01-31 2003 1 31 731246 64.95 64.95 64.95 64.95 0 64.95
2003-02-28 2003 2 28 731274 63.98 63.98 63.98 63.98 0 63.98
2003-03-31 2003 3 31 731305 64.59 64.59 64.59 64.59 0 64.59
2003-04-30 2003 4 30 731335 69.93 69.93 69.93 69.93 0 69.93
2003-05-30 2003 5 30 731365 73.61 73.61 73.61 73.61 0 73.61
I'm trying to calculate the returns like that:
In [90]: returns = (vfiax_monthly.open[1:] - vfiax_monthly.open[:-1])/vfiax_monthly.open[1:]
But I'm getting only zeroes:
In [91]: returns.head()
Out[91]:
2003-01-31 NaN
2003-02-28 0
2003-03-31 0
2003-04-30 0
2003-05-30 0
Freq: BM, Name: open
I think that's because the arithmetic operations get aligned on the index and that makes the [1:]
and [:-1]
useless.
My workaround is:
In [103]: returns = (vfiax_monthly.open[1:].values - vfiax_monthly.open[:-1].values)/vfiax_monthly.open[1:].values
In [104]: returns = pd.Series(returns, index=vfiax_monthly.index[1:])
In [105]: returns.head()
Out[105]:
2003-02-28 -0.015161
2003-03-31 0.009444
2003-04-30 0.076362
2003-05-30 0.049993
2003-06-30 0.012477
Freq: BM
Is there a better way to calculate the returns? I don't like the conversion to array and then back to Series.
回答1:
Instead of slicing, use .shift
to move the index position of values in a DataFrame/Series. For example:
returns = (vfiax_monthly.open - vfiax_monthly.open.shift(1))/vfiax_monthly.open.shift(1)
This is what pct_change
is doing under the bonnet. You can also use it for other functions e.g.:
(3*vfiax_monthly.open + 2*vfiax_monthly.open.shift(1))/5
You might also want to looking into the rolling and window functions for other types of analysis of financial data.
回答2:
The easiest way to do this is to use the DataFrame.pct_change() method.
Here is a quick example
In[1]: aapl = get_data_yahoo('aapl', start='11/1/2012', end='11/13/2012')
In[2]: appl
Out[2]:
Open High Low Close Volume Adj Close
Date
2012-11-01 598.22 603.00 594.17 596.54 12903500 593.83
2012-11-02 595.89 596.95 574.75 576.80 21406200 574.18
2012-11-05 583.52 587.77 577.60 584.62 18897700 581.96
2012-11-06 590.23 590.74 580.09 582.85 13389900 580.20
2012-11-07 573.84 574.54 555.75 558.00 28344600 558.00
2012-11-08 560.63 562.23 535.29 537.75 37719500 537.75
2012-11-09 540.42 554.88 533.72 547.06 33211200 547.06
2012-11-12 554.15 554.50 538.65 542.83 18421500 542.83
2012-11-13 538.91 550.48 536.36 542.90 19033900 542.90
In[3]: aapl.pct_change()
Out[3]:
Open High Low Close Volume Adj Close
Date
2012-11-01 NaN NaN NaN NaN NaN NaN
2012-11-02 -0.003895 -0.010033 -0.032684 -0.033091 0.658945 -0.033090
2012-11-05 -0.020759 -0.015378 0.004959 0.013558 -0.117186 0.013550
2012-11-06 0.011499 0.005053 0.004311 -0.003028 -0.291453 -0.003024
2012-11-07 -0.027769 -0.027423 -0.041959 -0.042635 1.116864 -0.038263
2012-11-08 -0.023020 -0.021426 -0.036815 -0.036290 0.330747 -0.036290
2012-11-09 -0.036049 -0.013073 -0.002933 0.017313 -0.119522 0.017313
2012-11-12 0.025406 -0.000685 0.009237 -0.007732 -0.445323 -0.007732
2012-11-13 -0.027502 -0.007250 -0.004251 0.000129 0.033244 0.000129
回答3:
The best way to calculate forward looking returns without any chance of bias is to use the built in function pd.DataFrame.pct_change()
. In your case all you need to use is this function since you have monthly data, and you are looking for the monthly return.
If, for example, you wanted to look at the 6 month return, you would just set the param
df.pct_change(periods = 6)
and that will give you the 6 month percent return.
Because you have a relatively small data set, the easiest way is to resample on the parameters that you need to calculate the data on then use the pct_change()
function again.
However because of the nice properties of log
it is common to use the formula for calculating returns (if you plan on doing statistics on the return series):
Which you would implement as such:
log_return = np.log(vfiax_monthly.open / vfiax_monthly.open.shift())
回答4:
Could also use a mix of diff
and shift
methods of pandas series:
retrun = vfiax_monthly.open.diff()/vfiax_monthly.open.shift(1)
来源:https://stackoverflow.com/questions/13385663/calculating-returns-from-a-dataframe-with-financial-data