Blog

Embarssingly Parallel Bayes

Markov Chain Monte Carlo (MCMC) for Bayesian inference has exploded over the past three decades with major improvements in computational power. These methods are particularly well suited to analysis with a hierarchical structure, for example, the analysis of a cross-section of consumers or geographic markets, and the incorporation of data at different levels of aggregation.

Recently researchers in industry and academia have considered scaling analysis to distributed data and more efficient master/slave processing configurations. Two approaches worth noting are the “Consensus Monte Carlo” (Scott et al, 2013) and “Asymptotically Exact Embarrassingly Parallel MCMC” (Neiswanger et al, 2014). While the latter has desirable theoretical properties, the two lend identical intuition for distributed computing and both lend themselves to embarrassingly parallel (MapReduce, or chained MapReduce) processing. In recent work with colleagues I’ve realized computational improvement in excess of worker scale due to the parallelization of burn-in, with the availability of Infrastructure as a service through AWS, Google Cloud, or others, the scale of the data simply requires a larger cluster to avoid slow-down resulting from data size.

Here, I include two packages I’ve written in MATLAB. The first uses MATLAB’s new MapReduce functionality. Its worth noting that you can deploy MapReduce commands written in MATLAB directly to Hadoop clusters. I’ve also run them on AWS clusters. Alternatively they run just fine locally on a datastore that maps data from the hard drive into active memory in predefined chunks (shards). The second package uses MATLAB’s parallel computing toolbox to apply the “Asymptotically Exact Embarrassingly Parallel” algorithm.

If you find any errors in my code, have suggestions for improvement, or just want to provide feedback please let me know. Thanks.

Links:

MapReduce for Consensus Monte Carlo Algorithm for Logit

Asymptotically Exact Embarrassingly Parallel MCMC for Hierarchical Logit

References:

Neiswanger, W., Wang, C., & Xing, E. P. (2014). Asymptotically Exact, Embarrassingly Parallel MCMC. Conference on Uncertainty in Artificial Intelligence.

Scott, S. L., Blocker, A. W., Bonassi, F. V., Chipman, H. A., George, E. I., & McCulloch, R. E. (2013, October). Bayes and big data: The consensus Monte Carlo algorithm. In EFaBBayes 250 conference (Vol. 16).