Map/Reduce Uncollapsed Gibbs Sampling for Bayesian Non Parametric Models
We propose a Map/Reduce parallel Gibbs sampling framework for Bayesian nonparametric (BNP) models. Our proposal relies on the idea of sampling all the hidden variables in order to break their dependencies, as it has been proposed in the related literature. We present it in a way that is easily generalizable and readily applicable for any BNP model with a likelihood function in the exponential family. We release our code in Spark/Scala for the Dirichlet and Beta processes, which allows us to run one iteration over 50M observations in less than two minutes.