DistanceDependentCRP.jl

DistanceDependentCRP.jl is a Julia package for Bayesian nonparametric clustering using the Distance Dependent Chinese Restaurant Process (DDCRP).

What is the DDCRP?

The standard Chinese Restaurant Process (CRP) places a prior over partitions of n observations, favouring partitions with few, large clusters. The DDCRP extends this by making cluster membership depend on the distances between observations: each customer i links to another customer j (or to themselves) with probability proportional to a decay function f(d_{ij}) of their distance. When customer i links to customer j, i inherits j's cluster. The resulting partitions are determined by the connected components of the directed customer–link graph.

The default decay function is exponential:

f(d; scale) = exp(-d × scale)

A self-link (c[i] = i) acts as a table head — the customer starts their own cluster. The concentration parameter α controls the prior probability of self-linking: higher α produces more clusters.

Available Models

ModelLikelihoodParametersInference
PoissonClusterRatesPoissonCluster rates λ_kRJMCMC
PoissonClusterRatesMargPoisson(marginalised)Gibbs
PoissonPopulationRatesPoisson + exposureCluster rates ρ_kRJMCMC
PoissonPopulationRatesMargPoisson + exposure(marginalised)Gibbs
BinomialClusterProbBinomialCluster probabilities p_kRJMCMC
BinomialClusterProbMargBinomial(marginalised)Gibbs
GammaClusterShapeMargGammaCluster shapes α_kRJMCMC

Marginalised variants integrate out cluster parameters analytically and use Gibbs sampling — they are faster and mix better but do not provide posterior samples of the cluster parameters themselves. Non-marginalised variants carry explicit cluster parameters and use Reversible Jump MCMC (RJMCMC).