问题
I am trying to create a set of rolling covariance matrices on financial data (window size = 60). Returns is a 125x3 df.
import pandas as pd
roll_rets = returns.rolling(window=60)
Omega = roll_rets.cov()
Omega is a 375x3 data frame with what looks like a multi-index - i.e. there are 3 values for each timestamp.
What I actually want this to return is a set of 66 3x3 covariance matrices (i.e. one for each period), but I can't work out how to iterate over returns correctly to do this. I think I'm missing something obvious. Thanks.
回答1:
Firstly: a MultiIndex DataFrame is an iterable object. (Try bool(pd.DataFrame.__iter__
). There are several StackOverflow questions on iterating through the sub-frames of a MultiIndex DataFrame, if you have interest.
But to your question directly, here is a dict: the keys are the (end) dates, and each value is a 3x3 NumPy array.
import pandas as pd
import numpy as np
Omega = (pd.DataFrame(np.random.randn(125,3),
index=pd.date_range('1/1/2010', periods=125),
columns=list('abc'))
.rolling(60)
.cov()
.dropna()) # this will get you to 66 windows instead of 125 with NaNs
dates = Omega.index.get_level_values(0) # or just the index of your base returns
d = dict(zip(dates, [Omega.loc[date].values for date in dates]))
Is this efficient? No, not very. You are creating a separate NumPy array for each value of the dict. Each NumPy array has its own dtype, etc. The DataFrame as it is now is arguably well-suited for your purpose. But one other solution is to create a single NumPy array by expanding the ndim
of Omega.values
:
Omega.values.reshape(66, 3, 3)
Here each element is a matrix (again, easily iterable, but loses the date indexing that you had in your DataFrame).
Omega.values.reshape(66, 3, 3)[-1] # last matrix/final date
Out[29]:
array([[ 0.80865977, -0.06134767, 0.04522074],
[-0.06134767, 0.67492558, -0.12337773],
[ 0.04522074, -0.12337773, 0.72340524]])
来源:https://stackoverflow.com/questions/45062622/create-rolling-covariance-matrix-in-pandas