I have some observational data for which I would like to estimate parameters, and I thought it would be a good opportunity to try out PYMC3.
My data is structured
There is nothing fundamentally wrong with your approach, except for the pitfalls of any Bayesian MCMC analysis: (1) non-convergence, (2) the priors, (3) the model.
Non-convergence: I find a traceplot that looks like this:
This is not a good thing, and to see more clearly why, I would change the traceplot code to show only the second-half of the trace, traceplot(success_samples[10000:])
:
The Prior: One major challenge for convergence is your prior on total_lambda_tau
, which is a exemplar pitfall in Bayesian modeling. Although it might appear quite uninformative to use prior Uniform('total_lambda_tau', lower=0, upper=100000)
, the effect of this is to say that you are quite certain that total_lambda_tau
is large. For example, the probability that it is less than 10 is .0001. Changing the prior to
total_lambda_tau= Uniform('total_lambda_tau', lower=0, upper=100)
total_lambda_mu = Uniform('total_lambda_mu', lower=0, upper=1000)
results in a traceplot that is more promising:
This is still not what I look for in a traceplot, however, and to get something more satisfactory, I suggest using a "sequential scan Metropolis" step (which is what PyMC2 would default to for an analogous model). You can specify this as follows:
step = pm.CompoundStep([pm.Metropolis([total_lambda_mu]),
pm.Metropolis([total_lambda_tau]),
pm.Metropolis([total_lambda]),
pm.Metropolis([loss_lambda_factor]),
])
This produces a traceplot that seems acceptible:
The model: As @KaiLondenberg responded, the approach you have taken with priors on total_lambda_tau
and total_lambda_mu
is not a standard approach. You describe widely varying event totals (1,000 one hour and 1,000,000 the next) but your model posits it to be normally distributed. In spatial epidemiology, the approach I have seen for analogous data is a model more like this:
import pymc as pm, theano.tensor as T
with Model() as success_model:
loss_lambda_rate = pm.Flat('loss_lambda_rate')
error = Poisson('error', mu=totalCounts*T.exp(loss_lambda_rate),
observed=successCounts)
I'm sure that there are other ways that will seem more familiar in other research communities as well.
Here is a notebook collecting up these comments.
I see several potential problems with the model.
1.) I would think that the success counts (called error ?) should follow a Binomial(n=total,p=loss_lambda_factor) distribution, not a Poisson.
2.) Where does the chain start ? Starting at a MAP or MLE configuration would make sense unless you use pure Gibbs sampling. Otherwise the chain might take a long time for burn-in, which might be what's happening here.
3.) Your choice of a hierarchical prior for total_lambda (i.e. normal with two uniform priors on those parameters) ensures that the chain will take a long time to converge, unless you pick your start wisely (as in Point 2.). You essentially introduce a lot of unneccessary degrees of freedom for the MCMC chain to get lost in. Given that total_lambda has to be nonngative, I would choose a Uniform prior for total_lambda in a suitable range (from 0 to the observed maximum for example).
4.) You use the Metropolis Sampler. 20000 samples might not be enough for that one. Try 60000 and discard the first 20000 as burn-in. The Metropolis Sampler can take a while to tune the step size, so it might well be that it spent the first 20000 samples to mainly reject proposals and tune. Try other samplers, such as NUTS.