Church: a language for probabilistic modeling

I’ve been reading about Church, which is a new computer language, developed in a prize-winning MIT doctoral thesis, that’s designed to make computers better at modeling probability distributions.

The idea is that simulations are cheap to run (given a probability distribution, generate an example outcome) but inferred distributions are expensive to run (from a set of data, what was the most likely probability distribution that could have generated it?) This is essentially a Bayesian task, and it’s what we want to do to understand, say, which borrowers are likeliest to default, or where terrorists are likely to strike again. It’s also the necessary building block of AI. The problem is that the space of probability distributions that can explain the data is very big. Infinitely big in reality, of course, but still exponentially big after discretizing. Also, while the computational complexity of evaluating f(g(x)) is just f + g, the computational complexity of composing two conditional probability distributions B|A and C|B is

ΣB P(C, B|A)

whose computational time will grow exponentially rather than linearly.

Church is an attempt to solve this problem. (Apparently it’s a practical attempt, because the founders have already started a company, Navia Systems, using this structure to build probabilistic computers.) The idea is, instead of describing a probability distribution as a deterministic procedure that evaluates the probabilities of different events, represent them in terms of probabilistic procedures for generating samples from them. That is, a random variable is actually a random variable. This means that repeating a computation will not give the same result each time, because evaluating a random variable doesn’t give the same result each time. There’s a computational advantage here because it’s possible to compose random variables without summing over all possible values.

Church is based on Lisp. At the lowest level, it replaces Boolean gates with stochastic digital circuits. These circuits are wired together to form Markov chains (the probabilistic counterpart of finite state machines.) At the top, it’s possible to define probabilistic procedures for generating samples from recursively defined distributions.

When I saw this paper, I thought it might make an interesting top-level post—unfortunately, I’m not the one to do it. I don’t know enough computer science; it’s been ages since I’ve touched Lisp, and lambda calculus is new to me. So this is an open call for volunteers—any brave Bayesians want to blog about a brand new computer language?

(As an aside, I think we need more technical posts so we don’t spend all our time hissing at each other; how would people feel about seeing summaries of recent research in the neuro/​cognitive science/​AI-ish cluster?)