I have some observational data for which I would like to estimate parameters, and I thought it would be a good opportunity to try out PYMC3.
My data is structured
I see several potential problems with the model.
1.) I would think that the success counts (called error ?) should follow a Binomial(n=total,p=loss_lambda_factor) distribution, not a Poisson.
2.) Where does the chain start ? Starting at a MAP or MLE configuration would make sense unless you use pure Gibbs sampling. Otherwise the chain might take a long time for burn-in, which might be what's happening here.
3.) Your choice of a hierarchical prior for total_lambda (i.e. normal with two uniform priors on those parameters) ensures that the chain will take a long time to converge, unless you pick your start wisely (as in Point 2.). You essentially introduce a lot of unneccessary degrees of freedom for the MCMC chain to get lost in. Given that total_lambda has to be nonngative, I would choose a Uniform prior for total_lambda in a suitable range (from 0 to the observed maximum for example).
4.) You use the Metropolis Sampler. 20000 samples might not be enough for that one. Try 60000 and discard the first 20000 as burn-in. The Metropolis Sampler can take a while to tune the step size, so it might well be that it spent the first 20000 samples to mainly reject proposals and tune. Try other samplers, such as NUTS.