MC-Stan on Spark?

非 Y 不嫁゛ 提交于 2019-12-03 09:37:29

Yes it's certainly possible but requires a bit more work. Stan (and popular MCMC tools that I know of) are not designed to be run in a distributed setting, via Spark or otherwise. In general, distributed MCMC is an area of active research. For a recent review, I'd recommend section 4 of Patterns of Scalable Bayesian Inference (PoFSBI). There are multiple possible ways you might want to split up a big MCMC computation but I think one of the more straightforward ways would be splitting up the data and running an off-the-shelf tool like Stan, with the same model, on each partition. Each model will produce a subposterior which can be reduce'd together to form a posterior. PoFSBI discusses several ways of combining such subposteriors.

I've put together a very rough proof of concept using pyspark and pystan (python is the common language with the most Stan and Spark support). It's a rough and limited implementation of the weighted-average consensus algorithm in PoFSBI, running on the tiny 8-schools dataset. I don't think this example would be practically very useful but it should provide some idea of what might be necessary to run Stan as a Spark program: partition data, run stan on each partition, combine the subposteriors.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!