API Reference

Complete listing of all exported types and functions.

Abstract Types

DistanceDependentCRP.PoissonModelType
PoissonModel <: LikelihoodModel

Abstract type for Poisson likelihood models. Concrete subtypes: PoissonClusterRates, PoissonClusterRatesMarg, PoissonPopulationRates

source

Data Containers

DistanceDependentCRP.CountDataType
CountData{Ty, Td} <: AbstractObservedData

Observed count data for Poisson and Binomial models.

Fields

  • y::Ty: Observed counts (AbstractVector)
  • D::Td: Distance matrix (AbstractMatrix)
source
DistanceDependentCRP.CountDataWithTrialsType
CountDataWithTrials{Ty, Tn, Td} <: AbstractObservedData

Observed count data with number of trials for Binomial models.

Fields

  • y::Ty: Observed successes (AbstractVector)
  • N::Tn: Number of trials (scalar Int or AbstractVector{Int})
  • D::Td: Distance matrix (AbstractMatrix)
source
DistanceDependentCRP.CountDataWithPopulationType
CountDataWithPopulation{Ty, Tp, Td} <: AbstractObservedData

Observed count data with population/exposure offsets for Poisson/NB population models.

Fields

  • y::Ty: Observed counts (AbstractVector)
  • P::Tp: Population or exposure (scalar or AbstractVector{<:Real})
  • D::Td: Distance matrix (AbstractMatrix)
  • missing_mask::BitVector: true for indices with missing observations (default: all false)
source
DistanceDependentCRP.ContinuousDataType
ContinuousData{Ty, Td} <: AbstractObservedData

Observed continuous data for continuous-valued models (e.g. Gamma).

Fields

  • y::Ty: Observed values (AbstractVector{<:Real})
  • D::Td: Distance matrix (AbstractMatrix)
source

DDCRP Parameters and Options

DistanceDependentCRP.DDCRPParamsType
DDCRPParams{T<:Real}

DDCRP hyperparameters (shared across models).

Fields

  • α::T: Concentration parameter (self-link probability)
  • scale::T: Distance decay scale parameter
  • decay_fn::Function: Decay function (default: exponential)
  • α_a::Union{T,Nothing}: Gamma shape prior for α (nothing = don't infer)
  • α_b::Union{T,Nothing}: Gamma rate prior for α
  • s_a::Union{T,Nothing}: Gamma shape prior for scale s (nothing = don't infer)
  • s_b::Union{T,Nothing}: Gamma rate prior for scale s
source
DistanceDependentCRP.MCMCOptionsType
MCMCOptions

Configuration for MCMC sampling. Birth proposals and fixed-dimension proposals are passed directly to mcmc as arguments, not through options.

Fields

  • n_samples::Int: Number of MCMC iterations (default: 10000)
  • verbose::Bool: Print progress (default: false)
  • infer_params::Dict{Symbol, Bool}: Parameters to explicitly disable inference for (default: empty — all parameters inferred)
  • prop_sds::Dict{Symbol, Float64}: Proposal standard deviations for MH parameter updates
  • track_diagnostics::Bool: Track acceptance rates (default: true)
  • track_pairwise::Bool: Track pairwise proposals (default: false)
source

Birth Proposals

DistanceDependentCRP.BirthProposalType
BirthProposal

Abstract supertype for RJMCMC birth proposal distributions. Controls how new cluster parameters are proposed when clusters split. Proposal objects are passed directly to mcmc and carry their own configuration.

source
DistanceDependentCRP.ConjugateProposalType
ConjugateProposal <: BirthProposal

Marker type indicating the model has conjugate cluster parameters. When used, update_c! dispatches to Gibbs sampling for assignments instead of RJMCMC, and cluster parameters are resampled from their conjugate posteriors after assignment updates.

source
DistanceDependentCRP.NormalMomentMatchType
NormalMomentMatch <: MomentMatchedProposal

Sample new cluster parameters from truncated Normal centered at empirical mean.

Fields

  • σ::Vector{Float64}: One proposal std per cluster parameter
source
DistanceDependentCRP.InverseGammaMomentMatchType
InverseGammaMomentMatch <: MomentMatchedProposal

Fit InverseGamma to data in moving set via method of moments. Falls back to prior if moment matching fails.

Fields

  • min_size::Int: Minimum cluster size to attempt moment matching
source
DistanceDependentCRP.LogNormalMomentMatchType
LogNormalMomentMatch <: MomentMatchedProposal

Sample on log-scale using moment-matched LogNormal proposal. For each parameter, proposes log(θ) ~ Normal(log(θest), σ) where θest is a moment-based estimate.

Fields

  • σ::Vector{Float64}: One proposal std per cluster parameter (on log-scale)
  • min_size::Int: Minimum cluster size for moment estimation
source
DistanceDependentCRP.MixedProposalType
MixedProposal{T<:NamedTuple} <: BirthProposal

Compose per-parameter birth proposals. Each cluster parameter can use a different proposal strategy. The proposals field is a NamedTuple mapping parameter names (e.g. , ) to individual BirthProposal instances.

Dispatches to sample_birth_param and birth_param_logpdf for each parameter, which are implemented per (model, parameter, proposal) combination in each model file.

Example

MixedProposal(
    λ = LogNormalMomentMatch(0.5),
    α = NormalMomentMatch(0.5)
)
source

Fixed-Dimension Proposals

DistanceDependentCRP.FixedDimensionProposalType
FixedDimensionProposal

Abstract supertype for RJMCMC fixed-dimension proposal distributions. Controls how cluster parameters are updated when the moving set S_i transfers between existing clusters without changing the total number of clusters K.

source
DistanceDependentCRP.NoUpdateType
NoUpdate <: FixedDimensionProposal

Keep existing cluster parameters unchanged during fixed-dimension moves. The acceptance probability depends solely on the posterior ratio.

source
DistanceDependentCRP.WeightedMeanType
WeightedMean <: FixedDimensionProposal

Deterministically update parameters as weighted averages of cluster contents. For a parameter ρ, the augmented cluster gets a weighted mean incorporating the moving set, and the depleted cluster is adjusted accordingly. The update is deterministic (lpr = 0) so the Jacobian is unity.

source
DistanceDependentCRP.ResampleType
Resample{P<:BirthProposal} <: FixedDimensionProposal

Stochastically resample cluster parameters for the modified clusters using an inner BirthProposal. Reuses sample_birth_param/birth_param_logpdf applied to the new cluster memberships (remaining depleted, augmented). The Hastings ratio accounts for the forward and reverse proposal densities.

Fields

  • proposal::P: The birth proposal to use for resampling

Example

Resample(NormalMomentMatch(0.5, 0.3, 0.5))  # moment-matched resampling
Resample()                                    # prior-based resampling
source
DistanceDependentCRP.MixedFixedDimType
MixedFixedDim{T<:NamedTuple} <: FixedDimensionProposal

Compose per-parameter fixed-dimension proposals. Each cluster parameter can use a different update strategy. The proposals field is a NamedTuple mapping parameter names to individual FixedDimensionProposal instances. Unspecified parameters default to NoUpdate.

Example

MixedFixedDim(ξ = WeightedMean(), ω = NoUpdate(), α = NoUpdate())
source

Poisson Models

DistanceDependentCRP.PoissonClusterRatesType
PoissonClusterRates <: PoissonModel

Poisson model with explicit cluster-specific rates. Rates λ_k are maintained and updated via conjugate Gibbs sampling.

Parameters:

  • c: Customer assignments
  • λ_k: Cluster rates (cluster-level)
source
DistanceDependentCRP.PoissonClusterRatesStateType
PoissonClusterRatesState{T<:Real} <: AbstractMCMCState{T}

State for PoissonClusterRates model.

Fields

  • c::Vector{Int}: Customer assignments (link representation)
  • λ_dict::Dict{Vector{Int}, T}: Table -> cluster rate mapping
source
DistanceDependentCRP.PoissonClusterRatesSamplesType
PoissonClusterRatesSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for PoissonClusterRates model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples x nobs)
  • λ::Matrix{T}: Cluster rates per observation (nsamples x nobs)
  • logpost::Vector{T}: Log-posterior values (n_samples)
source
DistanceDependentCRP.PoissonClusterRatesMargType
PoissonClusterRatesMarg <: PoissonModel

Poisson model with cluster rates marginalised out. Uses Gamma-Poisson conjugacy for closed-form marginal likelihood.

Parameters:

  • c: Customer assignments only
source
DistanceDependentCRP.PoissonClusterRatesMargSamplesType
PoissonClusterRatesMargSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for PoissonClusterRatesMarg model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples x nobs)
  • logpost::Vector{T}: Log-posterior values (n_samples)
source
DistanceDependentCRP.PoissonPopulationRatesType
PoissonPopulationRates <: PoissonModel

Poisson model with population/exposure adjustment. Rate for observation i in cluster k is λi = Pi * ρ_k.

Parameters:

  • c: Customer assignments
  • ρ_k: Cluster rate multipliers (cluster-level)

Requires exposure data P_i for each observation.

source
DistanceDependentCRP.PoissonPopulationRatesStateType
PoissonPopulationRatesState{T<:Real} <: AbstractMCMCState{T}

State for PoissonPopulationRates model.

Fields

  • c::Vector{Int}: Customer assignments (link representation)
  • ρ_dict::Dict{Vector{Int}, T}: Table -> cluster rate multiplier mapping
source
DistanceDependentCRP.PoissonPopulationRatesPriorsType
PoissonPopulationRatesPriors{T<:Real} <: AbstractPriors

Prior specification for PoissonPopulationRates model.

Fields

  • ρ_a::T: Gamma shape parameter for rate multiplier ρ
  • ρ_b::T: Gamma rate parameter for rate multiplier ρ
source
DistanceDependentCRP.PoissonPopulationRatesSamplesType
PoissonPopulationRatesSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for PoissonPopulationRates model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples x nobs)
  • ρ::Matrix{T}: Cluster rate multipliers per observation (nsamples x nobs)
  • logpost::Vector{T}: Log-posterior values (n_samples)
source
DistanceDependentCRP.PoissonPopulationRatesMargType
PoissonPopulationRatesMarg <: PoissonModel

Poisson model with population/exposure offsets and cluster rates marginalised out. Uses Gamma-Poisson conjugacy for a closed-form marginal likelihood.

Missing observations are excluded from the likelihood; their cluster assignments are updated using only the ddCRP prior.

Parameters:

  • c: Customer assignments only

Requires population data P_i for each observation via CountDataWithPopulation.

source
DistanceDependentCRP.PoissonPopulationRatesMargPriorsType
PoissonPopulationRatesMargPriors{T<:Real} <: AbstractPriors

Prior specification for PoissonPopulationRatesMarg model.

Fields

  • ρ_a::T: Gamma shape parameter for cluster rate multiplier ρ
  • ρ_b::T: Gamma rate parameter for cluster rate multiplier ρ
source
DistanceDependentCRP.PoissonPopulationRatesMargSamplesType
PoissonPopulationRatesMargSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for PoissonPopulationRatesMarg model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples × nobs)
  • logpost::Vector{T}: Log-posterior values (n_samples)
  • α_ddcrp::Vector{T}: DDCRP concentration samples (n_samples)
  • s_ddcrp::Vector{T}: DDCRP decay scale samples (n_samples)
source

Binomial Models

DistanceDependentCRP.BinomialClusterProbType
BinomialClusterProb <: BinomialModel

Binomial model with explicit cluster-specific success probabilities. Probabilities p_k are maintained and updated via conjugate Gibbs sampling.

Parameters:

  • c: Customer assignments
  • p_k: Cluster probabilities (cluster-level)
source
DistanceDependentCRP.BinomialClusterProbStateType
BinomialClusterProbState{T<:Real} <: AbstractMCMCState{T}

State for BinomialClusterProb model.

Fields

  • c::Vector{Int}: Customer assignments (link representation)
  • p_dict::Dict{Vector{Int}, T}: Table -> cluster probability mapping
source
DistanceDependentCRP.BinomialClusterProbSamplesType
BinomialClusterProbSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for BinomialClusterProb model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples x nobs)
  • p::Matrix{T}: Cluster probabilities per observation (nsamples x nobs)
  • logpost::Vector{T}: Log-posterior values (n_samples)
source
DistanceDependentCRP.BinomialClusterProbMargType
BinomialClusterProbMarg <: BinomialModel

Binomial model with cluster probabilities marginalised out. Uses Beta-Binomial conjugacy for closed-form marginal likelihood.

Parameters:

  • c: Customer assignments only
source
DistanceDependentCRP.BinomialClusterProbMargSamplesType
BinomialClusterProbMargSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for BinomialClusterProbMarg model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples x nobs)
  • logpost::Vector{T}: Log-posterior values (n_samples)
source

Gamma Models

DistanceDependentCRP.GammaClusterShapeMargType
GammaClusterShapeMarg <: GammaModel

Gamma model with cluster-specific shape parameters (αk). Rate parameters (βk) are marginalised out using Gamma-Gamma conjugacy.

Parameters:

  • α_k: Cluster shape parameters (cluster-level, explicit)
  • c: Customer assignments

Marginalised: β_k (cluster rate parameters integrated out analytically)

source
DistanceDependentCRP.GammaClusterShapeMargStateType
GammaClusterShapeMargState{T<:Real} <: AbstractMCMCState{T}

State for GammaClusterShapeMarg model.

Fields

  • c::Vector{Int}: Customer assignments (link representation)
  • α_dict::Dict{Vector{Int}, T}: Table -> cluster shape parameter mapping
source
DistanceDependentCRP.GammaClusterShapeMargPriorsType
GammaClusterShapeMargPriors{T<:Real} <: AbstractPriors

Prior specification for GammaClusterShapeMarg model.

Fields

  • α_a::T: Gamma shape parameter for α prior (shape of shape)
  • α_b::T: Gamma rate parameter for α prior
  • β_a::T: Gamma shape parameter for β prior (used in marginal likelihood)
  • β_b::T: Gamma rate parameter for β prior (used in marginal likelihood)
source
DistanceDependentCRP.GammaClusterShapeMargSamplesType
GammaClusterShapeMargSamples{T<:Real} <: AbstractMCMCSamples

MCMC samples container for GammaClusterShapeMarg model.

Fields

  • c::Matrix{Int}: Customer assignments (nsamples x nobs)
  • α::Matrix{T}: Shape per observation (nsamples x nobs) - stores cluster α
  • logpost::Vector{T}: Log-posterior values (n_samples)
source

Main MCMC Entry Point

DistanceDependentCRP.mcmcFunction
mcmc(model, data, ddcrp_params, priors, proposal; fixed_dim_proposal, opts)

Main MCMC entry point. Dispatches based on model type.

Arguments

  • model::LikelihoodModel: The likelihood model (determines parameter structure)
  • data::AbstractObservedData: Observed data container
  • ddcrp_params::DDCRPParams: DDCRP hyperparameters
  • priors::AbstractPriors: Prior specification
  • proposal::BirthProposal: Birth proposal for RJMCMC (or ConjugateProposal for Gibbs)

Keyword Arguments

  • fixed_dim_proposal::FixedDimensionProposal: Fixed-dimension proposal (default: NoUpdate())
  • opts::MCMCOptions: MCMC configuration

Returns

  • Model-specific *Samples struct (subtype of AbstractMCMCSamples)
  • MCMCDiagnostics (optional): If opts.track_diagnostics is true
source

Convenience: CountData models (Poisson, NegBin) with separate y, D.

source

Convenience: CountDataWithTrials models (Binomial) or CountDataWithPopulation models with separate y, N/P, D.

source

Convenience: ContinuousData models (Gamma) with separate y, D.

source

Model Interface Methods

DistanceDependentCRP.initialise_stateFunction
initialise_state(model::PoissonClusterRates, data, ddcrp_params, priors)

Create initial MCMC state for the model.

source
initialise_state(model::PoissonClusterRatesMarg, data, ddcrp_params, priors)

Create initial MCMC state for the model.

source
initialise_state(model::PoissonPopulationRates, data, ddcrp_params, priors)

Create initial MCMC state for the model.

source
initialise_state(model::PoissonPopulationRatesMarg, data, ddcrp_params, priors)

Create initial MCMC state. Assignments are drawn from the ddCRP prior.

source
initialise_state(model::BinomialClusterProb, data, ddcrp_params, priors)

Create initial MCMC state for the model.

source
initialise_state(model::BinomialClusterProbMarg, data, ddcrp_params, priors)

Create initial MCMC state for the model.

source
initialise_state(model::GammaClusterShapeMarg, data, ddcrp_params, priors)

Create initial MCMC state for the model. Initialises shape parameters using method of moments.

source
DistanceDependentCRP.allocate_samplesFunction
allocate_samples(model::PoissonClusterRates, n_samples, n)

Allocate storage for MCMC samples.

source
allocate_samples(model::PoissonClusterRatesMarg, n_samples, n)

Allocate storage for MCMC samples.

source
allocate_samples(model::PoissonPopulationRates, n_samples, n)

Allocate storage for MCMC samples.

source
allocate_samples(model::PoissonPopulationRatesMarg, n_samples, n)

Allocate storage for MCMC samples.

source
allocate_samples(model::BinomialClusterProb, n_samples, n)

Allocate storage for MCMC samples.

source
allocate_samples(model::BinomialClusterProbMarg, n_samples, n)

Allocate storage for MCMC samples.

source
allocate_samples(model::GammaClusterShapeMarg, n_samples, n)

Allocate storage for MCMC samples.

source
DistanceDependentCRP.extract_samples!Function
extract_samples!(model::PoissonClusterRates, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
extract_samples!(model::PoissonClusterRatesMarg, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
extract_samples!(model::PoissonPopulationRates, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
extract_samples!(model::PoissonPopulationRatesMarg, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
extract_samples!(model::BinomialClusterProb, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
extract_samples!(model::BinomialClusterProbMarg, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
extract_samples!(model::GammaClusterShapeMarg, state, samples, iter)

Extract current state into sample storage at iteration iter.

source
DistanceDependentCRP.update_params!Function
update_params!(model::PoissonClusterRates, state, data, priors, tables, log_DDCRP, opts)

Update cluster rates via conjugate Gibbs sampling. Assignment updates are handled separately by update_c!.

source
update_params!(model::PoissonClusterRatesMarg, state, data, priors, tables, log_DDCRP, opts)

Update customer assignments. No other parameters to update - rates are marginalised out. Returns diagnostics information for assignment updates.

source
update_params!(model::PoissonPopulationRates, state, data, priors, tables, log_DDCRP, opts)

Update all model parameters.

source
update_params!(model::PoissonPopulationRatesMarg, state, data, priors, tables, log_DDCRP, opts)

No parameter updates needed — cluster rates are fully marginalised out. Assignment updates are handled by update_c!.

source
update_params!(model::BinomialClusterProb, state, data, priors, tables, log_DDCRP, opts)

Update cluster probabilities via conjugate Gibbs sampling. Assignment updates are handled separately by update_c!.

source
update_params!(model::BinomialClusterProbMarg, state, data, priors, tables, log_DDCRP, opts)

Update customer assignments. No other parameters to update - probabilities are marginalised out.

source
update_params!(model::GammaClusterShapeMarg, state, data, priors, tables, log_DDCRP, opts)

Update model parameters (α). Assignment updates are handled separately by update_c! in the main MCMC loop.

source
DistanceDependentCRP.table_contributionFunction
table_contribution(model::PoissonClusterRates, table, state, data, priors)

Compute log-contribution of a table with explicit cluster rate.

source
table_contribution(model::PoissonClusterRatesMarg, table, state, data, priors)

Compute log-contribution of a table with marginalised cluster rate. Uses Gamma-Poisson conjugacy for closed-form marginal.

source
table_contribution(model::PoissonPopulationRates, table, state, data, priors)

Compute log-contribution of a table with population-adjusted Poisson likelihood.

Arguments

  • data: CountDataWithPopulation containing y (counts) and N (exposures/populations as P)
source
table_contribution(model::PoissonPopulationRatesMarg, table, state, data, priors)

Compute log-contribution of a table after analytically marginalising ρ_k. Only observed (non-missing) members contribute to the likelihood.

Returns 0.0 for tables with no observed members (the Gamma integral over the prior = 1).

TCk = Σ{i∈kobs} [yi·log(Pi) − loggamma(yi + 1)] + ρa·log(ρb) − loggamma(ρa) + loggamma(Sk + ρa) − (Sk + ρa)·log(Pktotal + ρb)

where Sk = Σ{i∈kobs} yi, Pktotal = Σ{i∈kobs} P_i.

source
table_contribution(model::BinomialClusterProb, table, state, data, priors)

Compute log-contribution of a table with explicit cluster probability.

source
table_contribution(model::BinomialClusterProbMarg, table, state, data, priors)

Compute log-contribution of a table with marginalised cluster probability. Uses Beta-Binomial conjugacy for closed-form marginal.

source
table_contribution(model::GammaClusterShapeMarg, table, state, data, priors)

Compute log-contribution of a table with marginalised cluster rate. Uses Gamma-Gamma conjugacy to integrate out β.

The marginal likelihood for n observations y1,...,yn with shape α is: log p(y | α, βa, βb) = (α - 1) * Σ log(yi) - n * loggamma(α) + loggamma(n*α + βa) - loggamma(βa) + βa * log(βb) - (n*α + βa) * log(Σyi + βb)

source
DistanceDependentCRP.posteriorFunction
posterior(model::PoissonClusterRates, data, state, priors, log_DDCRP)

Compute full log-posterior for Poisson model with explicit rates.

source
posterior(model::PoissonClusterRatesMarg, data, state, priors, log_DDCRP)

Compute full log-posterior for marginalised Poisson model.

source
posterior(model::PoissonPopulationRates, data, state, priors, log_DDCRP)

Compute full log-posterior for population-adjusted Poisson model.

source
posterior(model::PoissonPopulationRatesMarg, data, state, priors, log_DDCRP)

Compute full log-posterior for the marginalised Poisson population model.

source
posterior(model::BinomialClusterProb, data, state, priors, log_DDCRP)

Compute full log-posterior for Binomial model with explicit probabilities.

source
posterior(model::BinomialClusterProbMarg, data, state, priors, log_DDCRP)

Compute full log-posterior for marginalised Binomial model.

source
posterior(model::GammaClusterShapeMarg, data, state, priors, log_DDCRP)

Compute full log-posterior for Gamma model with marginalised rate.

source
DistanceDependentCRP.update_c!Function
update_c!(model, state, data, priors, birth_proposal, fixed_dim_proposal, log_DDCRP, opts)

Generic assignment update dispatcher. Uses Gibbs sampling when the model is marginalised or the proposal is conjugate; otherwise uses RJMCMC.

Returns a diagnostics vector of (movetype, i, jstar, accepted) tuples.

source
DistanceDependentCRP.cluster_param_dictsFunction
cluster_param_dicts(state::AbstractMCMCState) -> NamedTuple

Return a NamedTuple of all cluster parameter dicts from the state. The first dict is used as the "primary" dict for table lookups.

Each model returns its specific set of dicts, e.g.:

  • (m = state.m_dict,) for 1-parameter models
  • (m = state.m_dict, r = state.r_dict) for 2-parameter models
  • (ξ = state.ξ_dict, ω = state.ω_dict, α = state.α_dict) for 3-parameter models
source
DistanceDependentCRP.sample_birth_paramsFunction
sample_birth_params(model::LikelihoodModel, proposal::BirthProposal,
                    S_i::Vector{Int}, state::AbstractMCMCState,
                    data::AbstractObservedData, priors::AbstractPriors)
    -> (params_new::NamedTuple, log_q_forward::Float64)

Sample new cluster parameters for a birth move. params_new has the same keys as cluster_param_dicts but with scalar values.

Dispatches on both model type and proposal type.

source
DistanceDependentCRP.birth_params_logpdfFunction
birth_params_logpdf(model::LikelihoodModel, proposal::BirthProposal,
                    params_old::NamedTuple, S_i::Vector{Int},
                    state::AbstractMCMCState, data::AbstractObservedData,
                    priors::AbstractPriors) -> Float64

Log density of the birth proposal at the given parameter values. Used in death moves to compute the reverse Hastings ratio.

Dispatches on both model type and proposal type.

source
DistanceDependentCRP.sample_birth_paramFunction
sample_birth_param(model::LikelihoodModel, ::Val{param_name},
                   proposal::BirthProposal, S_i::Vector{Int},
                   state::AbstractMCMCState, data::AbstractObservedData,
                   priors::AbstractPriors)
    -> (value, log_q_forward::Float64)

Sample a single cluster parameter for a birth move. Used by MixedProposal to dispatch each parameter independently.

Dispatches on model type, Val{param_name}, and proposal type. Each model implements this for the (parameter, proposal) combinations it supports.

source
DistanceDependentCRP.birth_param_logpdfFunction
birth_param_logpdf(model::LikelihoodModel, ::Val{param_name},
                   proposal::BirthProposal, param_value,
                   S_i::Vector{Int}, state::AbstractMCMCState,
                   data::AbstractObservedData, priors::AbstractPriors)
    -> Float64

Log density of the per-parameter birth proposal at param_value. Used by MixedProposal in death moves to compute the reverse Hastings ratio.

Dispatches on model type, Val{param_name}, and proposal type.

source
DistanceDependentCRP.fixed_dim_paramsFunction
fixed_dim_params(model, proposal::FixedDimensionProposal,
                 S_i, table_depl, table_aug, state, data, priors)
    -> (params_depleted::NamedTuple, params_augmented::NamedTuple, log_proposal_ratio::Float64)

Compute updated parameter values for a different-table fixed-dimension move. Dispatches per-parameter to fixed_dim_param for each key in cluster_param_dicts. Returns scalar NamedTuples for the depleted and augmented tables, plus total lpr.

source
fixed_dim_params(model, proposal::MixedFixedDim, ...)

MixedFixedDim override: dispatches each parameter to its own per-parameter proposal. Parameters not present in proposal.proposals fall back to NoUpdate.

source
DistanceDependentCRP.fixed_dim_paramFunction
fixed_dim_param(model, ::Val{name}, proposal, S_i, table_depl, table_aug, state, data, priors)
    -> (val_depleted, val_augmented, log_proposal_ratio::Float64)

Compute updated values for a single cluster parameter during a fixed-dimension move where the moving set Si transfers from `tabledepltotable_aug`.

Returns the new parameter values for the depleted and augmented tables, plus the log proposal ratio log q(reverse) - log q(forward). For deterministic updates (NoUpdate, WeightedMean) this ratio is 0.0. For stochastic updates (Resample) it is non-zero.

Dispatches on model type, Val{name} (the parameter name), and proposal type. Models can override for model-specific behaviour (e.g., using latent variables instead of raw observations for weighted-mean updates).

source

DDCRP Core Utilities

DistanceDependentCRP.simulate_ddcrpFunction
simulate_ddcrp(D; α=1.0, scale=1.0, decay_fn=decay)

Simulate customer assignments from the DDCRP prior. Each customer links to another with probability proportional to distance decay, or to themselves with probability α.

source
DistanceDependentCRP.precompute_log_ddcrpFunction
precompute_log_ddcrp(f, α, scale, D)

Precompute log-DDCRP probability matrix.

  • Diagonal entries: log(α) (self-link probability)
  • Off-diagonal entries: log(f(D[i,j]; scale)) (distance-based link)
source
DistanceDependentCRP.compute_table_assignmentsFunction
compute_table_assignments(c::Vector{Int}, force_self_loop::Int=0)

Convert customer link vector c to table assignment labels. Uses cycle detection in the link graph.

Arguments

  • c: Customer assignment vector where c[i] is the customer i links to
  • force_self_loop: If > 0, treat customer at this index as having a self-loop

Returns

  • Vector of table IDs (cluster labels) for each customer
source
DistanceDependentCRP.table_vector_minus_iFunction
table_vector_minus_i(i, c)

Compute table configuration after temporarily removing customer i's link. Customer i is treated as having a self-loop.

Used in Gibbs sampling to evaluate alternative configurations.

source

DDCRP Hyperparameter Sampling

DistanceDependentCRP.compute_RFunction
compute_R(s, D)

Compute Ri = Σ{j≠i} exp(-s · d_{ij}) for all i. These are the unnormalized total link weights from observation i to all others.

source
DistanceDependentCRP.sample_V!Function
sample_V!(V, α, R)

Sample auxiliary variables Vi ~ Exponential(α + Ri) in-place. These are used for data-augmented Gibbs sampling of α.

Julia's Exponential(θ) uses scale parameterisation: E[V] = θ, so V ~ Exponential(1/(α + Ri)) gives rate = α + Ri.

source
DistanceDependentCRP.update_α_ddcrpFunction
update_α_ddcrp(n_self, V, ddcrp_params)

Exact Gibbs update for the DDCRP self-link parameter α.

Uses data augmentation: given auxiliary variables Vi ~ Exp(α + Ri), the conditional posterior is conjugate:

α | V, c, s ~ Gamma(a_α + n_self, 1 / (b_α + Σ_i V_i))

where (aα, bα) are the Gamma shape and rate prior parameters.

Arguments

  • n_self: Number of self-links in current assignment c
  • V: Auxiliary variable vector (length n), already sampled
  • ddcrp_params: DDCRPParams with prior fields αa, αb set

Returns

  • New sampled value of α
source
DistanceDependentCRP.update_s_ddcrpFunction
update_s_ddcrp(s, α, c, D, ddcrp_params, prop_sd)

MH update for the DDCRP distance scale s using the non-augmented likelihood. Used when α is not being inferred (no auxiliary variables available).

The log acceptance ratio is: log r = as · log(s'/s) − (bs + Dsum) · (s' − s) − Σi [log Zi(s') − log Zi(s)]

where Zi(s) = α + Ri(s) is the normalising constant for observation i.

Arguments

  • s: Current scale value
  • α: Current (fixed) α value
  • c: Current customer assignment vector
  • D: Distance matrix
  • ddcrp_params: DDCRPParams with prior fields sa, sb set
  • prop_sd: Log-normal proposal standard deviation

Returns

  • New (accepted or rejected) value of s
source
DistanceDependentCRP.update_s_ddcrp_augmentedFunction
update_s_ddcrp_augmented(s, α, V, R_current, c, D, ddcrp_params, prop_sd)

MH update for the DDCRP distance scale s using the data-augmented likelihood.

Uses a log-normal random walk proposal: s' = s · exp(ε), ε ~ N(0, σ²).

The log acceptance ratio (including Jacobian) is: log r = as · log(s'/s) − (bs + Dsum) · (s' − s) − Σi Vi · [Ri(s') − R_i(s)]

where Dsum = Σ{i: ci ≠ i} d{i,ci} and Ri(s) = Σ{j≠i} exp(-s·d{ij}).

Arguments

  • s: Current scale value
  • V: Auxiliary variables (already sampled), used in acceptance ratio
  • R_current: Pre-computed R_i values for current s
  • c: Current customer assignment vector
  • D: Distance matrix
  • ddcrp_params: DDCRPParams with prior fields sa, sb set
  • prop_sd: Log-normal proposal standard deviation

Returns

  • New (accepted or rejected) value of s
source

Diagnostics

DistanceDependentCRP.MCMCSummaryType
MCMCSummary

Summary statistics for an MCMC run. Model-agnostic: computes diagnostics for all available parameter fields in the samples struct.

Fields

  • acc_rates::NamedTuple: Acceptance rates for birth/death/fixed moves
  • ess_n_clusters::Float64: ESS for number of clusters
  • ess_logpost::Float64: ESS for log-posterior
  • ess_params::Dict{Symbol, Float64}: ESS for each parameter field
  • iat_n_clusters::Float64: IAT for number of clusters
  • iat_logpost::Float64: IAT for log-posterior
  • iat_params::Dict{Symbol, Float64}: IAT for each parameter field
  • total_time::Float64: Total MCMC runtime in seconds
  • ess_per_sec_n_clusters::Float64: ESS per second for number of clusters
  • total_proposals::Int: Total number of proposals
  • birth_fraction::Float64: Fraction of birth proposals
  • death_fraction::Float64: Fraction of death proposals
  • fixed_fraction::Float64: Fraction of fixed-dimension proposals
  • param_names::Vector{Symbol}: Names of parameter fields found
source
DistanceDependentCRP.integrated_autocorrelation_timeFunction
integrated_autocorrelation_time(x::AbstractVector; max_lag=nothing, method=:initial_positive)

Compute Integrated Autocorrelation Time (IAT). τ = 1 + 2 * Σ_{k=1}^{K} ρ(k)

Methods

  • :simple - Sum until first negative autocorrelation
  • :initial_positive - Geyer's initial positive sequence estimator (recommended)
  • :batch - Batch means estimator
source
DistanceDependentCRP.summarize_mcmcFunction
summarize_mcmc(samples::AbstractMCMCSamples, diag::MCMCDiagnostics)

Compute comprehensive summary of MCMC run. Automatically discovers and computes diagnostics for all parameter fields in the samples struct.

source
DistanceDependentCRP.get_parameter_fieldsFunction
get_parameter_fields(samples::AbstractMCMCSamples)

Discover parameter fields in the samples struct. Returns field names that are 2D matrices (nsamples x nobs) excluding c. These represent per-observation parameter samples.

source
DistanceDependentCRP.compute_param_summaryFunction
compute_param_summary(samples::AbstractMCMCSamples, fname::Symbol)

Compute mean across observations for each sample iteration. Returns a vector of length n_samples suitable for ESS/IAT computation.

source

Simulation Utilities

DistanceDependentCRP.simulate_poisson_dataFunction
simulate_poisson_data(n, cluster_rates; α=0.1, scale=1.0, x=nothing)

Simulate Poisson data with DDCRP clustering.

Arguments

  • x: Optional pre-specified 1D covariate vector of length n. If nothing (default), positions are drawn uniformly from [0, 1].
source
DistanceDependentCRP.simulate_binomial_dataFunction
simulate_binomial_data(n, N, cluster_probs; α=0.1, scale=1.0, x=nothing)

Simulate Binomial data with DDCRP clustering.

Arguments

  • n: Number of observations
  • N: Number of trials (scalar or vector)
  • cluster_probs: True cluster success probabilities
  • x: Optional pre-specified 1D covariate vector of length n. If nothing (default), positions are drawn uniformly from [0, 1].
source
DistanceDependentCRP.simulate_gamma_dataFunction
simulate_gamma_data(n, cluster_shapes, cluster_rates; α=0.1, scale=1.0, x=nothing)

Simulate Gamma data with DDCRP clustering.

Arguments

  • n: Number of observations
  • cluster_shapes: True cluster shape parameters (α_k)
  • cluster_rates: True cluster rate parameters (β_k)
  • α: DDCRP concentration parameter
  • scale: DDCRP distance scale
  • x: Optional pre-specified 1D covariate vector of length n. If nothing (default), positions are drawn uniformly from [0, 1].

Returns

Named tuple with:

  • y: Observed positive continuous values
  • α_shape: Shape per observation (named α_shape to avoid conflict with DDCRP α)
  • β: Rate per observation
  • c: Customer assignments
  • tables: Table structure
  • x: Covariate (used to construct distance)
  • D: Distance matrix
source

Posterior Analysis

DistanceDependentCRP.compute_vi_traceFunction
compute_vi_trace(c_samples::Matrix{Int}, c_true::Vector{Int}) -> Vector{Float64}

Variation of Information between each MCMC sample partition and the true partition. Lower is better; 0 = perfect recovery.

VI(U, V) = H(U|V) + H(V|U) where H is conditional entropy, computed from the contingency table of the two partitions.

source
DistanceDependentCRP.point_estimate_clusteringFunction
point_estimate_clustering(c_samples::Matrix{Int}; method=:MAP)

Compute a point estimate of the clustering from posterior samples.

Methods

  • :MAP: Most frequent clustering configuration
  • :median_K: Sample with number of clusters closest to median
  • :posterior_mean: Threshold similarity matrix (requires further clustering)
source
DistanceDependentCRP.compute_waicFunction
compute_waic(y, λ_samples; burnin=0) -> NamedTuple

Watanabe-Akaike Information Criterion for a Poisson observation model. Lower WAIC indicates better out-of-sample predictive fit.

The observation model is y_i | λ_i ~ Poisson(λ_i), so

lppd   = Σ_i log E_s[ p(y_i | λ_s_i) ]
p_WAIC = Σ_i Var_s[ log p(y_i | λ_s_i) ]
WAIC   = -2 (lppd - p_WAIC)

Arguments

  • y::AbstractVector: observed counts (length n)
  • λ_samples::AbstractMatrix: posterior samples, shape (n_samples × n)
  • burnin::Int=0: rows to discard before computing

Returns

NamedTuple with fields waic, lppd, p_waic, waic_i (per-obs contributions)

source
DistanceDependentCRP.compute_lpmlFunction
compute_lpml(y, λ_samples; burnin=0) -> Float64

Log Pseudo-Marginal Likelihood via the Conditional Predictive Ordinate (CPO). Higher LPML indicates better predictive fit.

log CPO_i = -log E_s[ 1 / p(y_i | λ_s_i) ]   (harmonic-mean estimator)
LPML      = Σ_i log CPO_i

Uses log-sum-exp for numerical stability.

Arguments

  • y::AbstractVector: observed counts (length n)
  • λ_samples::AbstractMatrix: posterior samples, shape (n_samples × n)
  • burnin::Int=0: rows to discard before computing
source
DistanceDependentCRP.compute_psis_looFunction
compute_psis_loo(ll_matrix::AbstractMatrix{Float64})

Pareto-Smoothed Importance Sampling Leave-One-Out cross-validation (PSIS-LOO). Uses the Zhang & Stephens (2009) GPD tail-fitting approximation.

Arguments

  • ll_matrix: (n_samples × n_obs) matrix where ll_matrix[s, i] = log p(y_i | θ^(s))

Returns

NamedTuple with fields:

  • elpd_loo: Total expected log pointwise predictive density (sum over obs)
  • loo_i: Per-observation ELPD-LOO contributions (length n)
  • k_hat: Per-observation Pareto shape estimates (k̂ > 0.7 indicates instability)
source
DistanceDependentCRP.posterior_predictiveFunction
posterior_predictive(model, samples, data, priors)

Generate posterior predictive draws for missing observations.

For each MCMC sample and each missing index j:

  1. Find j's cluster in the sampled assignment vector
  2. Compute the conjugate posterior for ρ_k using observed cluster members
  3. Draw ρk ~ Gamma(Sobs + ρa, 1/(Pobs + ρ_b))
  4. Draw yj ~ Poisson(Pj * ρ_k)

Falls back to the prior when no observed data is available for the cluster.

Returns

  • pred::Matrix{Int}: shape (nsamples, nmissing), predictive draws
  • missing_indices::Vector{Int}: which original indices are missing
source

Sampler Internals

DistanceDependentCRP.get_moving_setFunction
get_moving_set(i, c)

Get the moving set S_i: the connected component containing customer i when customer i's link is temporarily removed (set to self-loop).

source
get_moving_set(i, c, table_Si)

Fast version: find Si within the known table `tableSi(sorted dict key). Traces only within the table instead of recomputing all tables from scratch. Result is sorted (preserves order from sortedtable_Si`).

source
DistanceDependentCRP.update_c_rjmcmc!Function
update_c_rjmcmc!(model, i, state, data, priors, birth_proposal, fixed_dim_proposal, log_DDCRP)

Generic RJMCMC update for customer i's assignment. Dispatches on interface methods to handle any number of cluster parameter dicts. Uses in-place state modification with save/restore instead of copying, and delta-posterior computation instead of full posterior evaluation.

Returns (movetype::Symbol, jstar::Int, accepted::Bool)

source
DistanceDependentCRP.update_c_gibbs!Function
update_c_gibbs!(model, i, state, data, priors, log_DDCRP)

Internal Gibbs sampling update for customer i's assignment. Called by update_c! for marginalised models or ConjugateProposal.

Returns (movetype, newassignment, accepted). For Gibbs sampling, move_type is always :gibbs and accepted is always true.

source
DistanceDependentCRP.save_entriesFunction
save_entries(dicts::NamedTuple, table_keys)

Save current entries for the given table keys from each dict in the NamedTuple. Returns a NamedTuple of Vector{Pair{Vector{Int}, T}} for each dict.

source
DistanceDependentCRP.restore_entries!Function
restore_entries!(dicts::NamedTuple, saved::NamedTuple, keys_to_delete)

Restore dicts to their saved state:

  1. Delete any keys in keys_to_delete from all dicts
  2. Re-insert all saved entries
source
DistanceDependentCRP.sorted_setdiffFunction
sorted_setdiff(a::Vector{Int}, b::Vector{Int}) -> Vector{Int}

Compute setdiff(a, b) for sorted vectors. Returns a sorted result. Avoids the Set/Dict allocation that Base.setdiff uses internally. Both a and b must be sorted in ascending order.

source
DistanceDependentCRP.sorted_mergeFunction
sorted_merge(a::Vector{Int}, b::Vector{Int}) -> Vector{Int}

Merge two sorted vectors into a single sorted vector. Equivalent to sort(vcat(a, b)) but avoids intermediate allocation and sorting. Assumes no duplicates between a and b (disjoint sets).

source

Proposal Utilities

DistanceDependentCRP.fit_inverse_gamma_momentsFunction
fit_inverse_gamma_moments(data) -> (α, β) or nothing

Fit InverseGamma(α, β) to data using method of moments. For X ~ InverseGamma(α, β): E[X] = β / (α - 1) for α > 1 Var[X] = β² / ((α-1)²(α-2)) for α > 2

Returns nothing if fitting fails (insufficient data, zero variance, invalid params).

source
DistanceDependentCRP.update_cluster_rates!Function
update_cluster_rates!(model::PoissonClusterRates, state, data, priors, tables)

Update cluster rates using conjugate Gibbs sampling. Posterior: Gamma(λa + Sk, λb + nk)

source
update_cluster_rates!(model::PoissonPopulationRates, state, data, priors, tables)

Update cluster rate multipliers using conjugate Gibbs sampling. Posterior: Gamma(ρa + Sk, ρb + sumP_k)

source
DistanceDependentCRP.update_α!Function
update_α!(model::GammaClusterShapeMarg, state, data, priors; prop_sd=0.5)

Update all cluster shape parameters using Metropolis-Hastings on log-scale.

source

Prior Types

DistanceDependentCRP.PoissonPriorsType
PoissonPriors{T<:Real} <: AbstractPriors

Prior specification for Poisson model.

Fields

  • λ_a::T: Gamma shape parameter for rate λ
  • λ_b::T: Gamma rate parameter for rate λ
source
DistanceDependentCRP.BinomialPriorsType
BinomialPriors{T<:Real} <: AbstractPriors

Prior specification for Binomial model.

Fields

  • p_a::T: Beta α parameter for success probability p
  • p_b::T: Beta β parameter for success probability p
  • N::Int: Number of trials (can be observation-specific in data)
source