Write once, infer anywhere: Modular, reusable probabilistic inference components

The Julia language is pretty interesting; it’s a scientific/numerical language (think Matlab or Scipy) with a syntax that corresponds pretty closely with mathematical notation. It has a just-in-time compiler so it’s fairly fast (you can write fast loops in it) and that seems to be the reason a lot of people adopt it over, say, Matlab. But that’s not the main reason it’s an interesting language. The main reason, in my opinion, is the type system.
Julia’s type system is a very structured type system. It’s pretty impressive and well thought-out. It lets you do things that would normally be the domain of functional languages like OCaml or Haskell. Better yet, it has strong enough type inference and conversion that most of the time you don’t need to worry about types. In other words, the types only add to the language – they don’t really take anything away from it. I highly recommend the Julia documentation’s own reference on types, as well as the wikibooks entry.

In this blog post I’m going to use Julia’s type system to write the expectation-maximization algorithm in a completely general way.

A couple years ago Dahua Lin posted a message on julia-stats calling for construction of a new language (within Julia) that would take general graphical models (in abstract form) and ‘compile’ them to Julia code. As far as I’m aware the idea didn’t result in any working system, but it’s an interesting idea. This would free modelers of the hassles of re-implementing a whole bunch of inference algorithms (like EM) over and over again. It’s a very intriguing idea and would be awesome if it could be made to work. The attached slides are here (I’m linking it because I will be referring to it in this write-up).

Inspired by that, I decided to use the Distributions.jl package as a starting point to see how far I could go just by defining types and generic functions on those types to do various inference procedures (such as message passing). Note that I’m not defining a new language, just exploring what the Julia compiler itself can do. I found, pleasantly enough, that the interface Distributions.jl provides is rich enough to implement EM on both models Lin mentions in his presentation, just by some minimal EM code, specifying the high-level model and using minimal ‘grunt’ details. To whet your appetite, here’s a generic implementation of the EM algorithm that works, without modification, on any type of mixture (fit_mm_em):

# This returns, for a model and observations x,# the distribution over latent variables.
function infer(m::MixtureModel, x)
        K = length(m.component)  # number of mixture components
        N = size(x,2)            # number of data points
        lq = Array(Float64, N, K)
        for k = 1:K
                lq[:,k] = logpdf(m.component[k], x) .+ logpdf(m.mixing, k)

logp_to_p(lp) = exp(lp .- maximum(lp))

function fit_mm_em{T}(m::MixtureModel{T}, x)
        # Expectation step
        lq = infer(m, x)

        # Normalize log-probability and convert to probability
        q = logp_to_p(lq)
        q = q ./ sum(q,2)

        # Maximization step
        cr = 1:length(m.component)
        comps = [fit_em(m.component[k], x, q[:,k]) for k = cr]
        mix   =  fit_em(m.mixing, [cr], vec(sum(q,1)))

        MixtureModel{T}(mix, comps)

# 'fallback' function
fit_em(m::Distribution, x, w) = fit_mle(typeof(m), x, w)
All that this requires is that fit_em be defined on the type of mixture component. If not, it will use the fallback fit_mle(). The mixture model type is defined as:

# A 'closed' mixture model defining a full generative modeltype MixtureModel{T} <: Distribution
        mixing::Distribution   # Distribution over mixing components
        component::Vector{T}   # Individual component distributionsend
Now let’s have some fun. By passing a Mixture{MvNormal} type to the above function, we can get it to work on a normal gaussian mixture (the first model Lin mentions in his presentation). See gmm_test.jl in the attached code. But what about Lin’s second model (page 9)? That involves mixture components and weights that have priors on their parameters. Do we have to change the EM function above? No. All we have to do is define a new type:
type DistWithPrior <: Distribution
        pri                    # a prior over the parameters of dist (tuple)
        dist::Distribution     # the distribution itselfend

And the corresponding functions on it:

import Distributions:logpdf
fit_em{T<:DistWithPrior}(m::T, x, w) = T(m.pri, fit_map(m.pri, typeof(m.dist), x, w)) logpdf{T<:DistWithPrior}(m::T, x) = logpdf(m.dist, x)

Pretty simple. Now by, say, creating a DistWithPrior(Dirichlet, Categorical) distribution, you can model a categorical variable with a Dirichlet prior, and the EM algorithm will update it according to that prior, automatically.

I think this is pretty cool because this is achieved without ever even worrying about the grunt details of mathematical formulas. You can see this working on a sample data set in dahualin.jl in the attached code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s