PyMC - wishart distribution for covariance estimate

时光毁灭记忆、已成空白 提交于 2019-12-10 11:49:41

问题


I need to model and estimate a variance-covariance matrix from asset class returns so I was looking at the stock returns example given in chapter 6 of https://github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers

Here is my simple implementation where I start with a sample using a multivariate normal with a known mean and variance-covariance matrix. I then try to estimate it using a non-informative priror.

The estimate is different from the known prior so I'm not sure if my implementation is correct. I'd appreciate if someone can point out what I'm doing wrong ?

import numpy as np
import pandas as pd
import pymc as pm


p=3
mu=[.03,.05,-.02]
cov_matrix= [[.025,0.0075, 0.00175],[0.0075,.007,0.00135],[0.00175,0.00135,.00043]]

n_obs=10000
x=np.random.multivariate_normal(mu,cov_matrix,n_obs)

prior_mu=np.ones(p)

prior_sigma = np.eye(p)


post_mu = pm.Normal("returns",prior_mu,1,size=p)
post_cov_matrix_inv = pm.Wishart("cov_matrix_inv",n_obs,np.linalg.inv(cov_matrix))

obs = pm.MvNormal( "observed returns", post_mu, post_cov_matrix_inv, observed = True, value = x )

model = pm.Model( [obs, post_mu, post_cov_matrix_inv] )
mcmc = pm.MCMC()

mcmc.sample( 5000, 2000, 3 )

mu_samples = mcmc.trace("returns")[:]
mu_samples.mean(axis=0)
cov_inv_samples = mcmc.trace("cov_matrix_inv")[:]
mean_covariance_matrix = np.linalg.inv( cov_inv_samples.mean(axis=0) )

回答1:


here are some suggestions I'll make that can improve the code + inference:

  1. I would pm.Wishart("cov_matrix_inv",n_obs,np.linalg.inv(cov_matrix)) to pm.Wishart("cov_matrix_inv",n_obs,np.eye(3) ) as it is more objective (and with 10000 data points your prior is not going to matter much anyways)

  2. mcmc = pm.MCMC() should be mcmc = pm.MCMC(model)

  3. mcmc.sample( 5000, 2000, 3 ) There are far to few samples here. The second half of MCMC, Monte Carlo, is strongest when there are lots of samples: I mean tens of thousands. Here you are only have 1000, thus the error caused by Monte Carlo will be quite high (the error decreases with increasing the sample size). Furthermore, the MCMC has likely not converged after 2000 burn in samples. You can check the convergence with plot in pymc.Matplot and calling plot(mcmc). I used mcmc.sample( 25000, 15000, 1 ) and was getting better results.

I imagine the reason you used such low samples was the performance. Much of that is caused by the large number of samples: you have 10000 observations. That might be quite high for what you actually have in practice.

And remember, much of the value in Bayesian inference is being given posterior samples: taking the mean of these samples seems like a waste - think about using the samples in a Loss function (see chapter 5 in the book).




回答2:


Please note that if you want to use a informative prior, you should not use np.linalg.inv(cov_matrix) for Wishart, but just cov_matrix. To be exact, you should use cov_matrix * n_obs in order to scale properly



来源:https://stackoverflow.com/questions/21466486/pymc-wishart-distribution-for-covariance-estimate

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!