Compositional language for hypotheses about computations

TLDR: I sketch a language for describing (either Bayesian or infra-Bayesian) hypotheses about computations (i.e. computable[1] logical/​analytic facts). This is achieved via a novel (AFAIK) branch of automata theory, which generalizes both ordinary and tree automata by recasting them in category-theoretic language. The resulting theory can be equipped with a intriguing self-referential feature: automata that “build” and run new automata in “runtime”.

Epistemic status: this is an “extended shortform” i.e. the explanations are terse, the math is a sketch (in particular the proofs are not spelled out) and there might be errors (but probably not fatal errors). I made it a top-level post because it seemed worth reporting and was too long for a shortform.

Motivation

Finite-state automata are a convenient tool for constructing hypotheses classes that admit efficient learning. A classical example is MDPs in reinforcement learning. Ordinary automata accept strings in input. This is convenient when the input is a time sequence. However, in Turing Reinforcement Learning and infra-Bayesian physicalism we want to equip our agent with beliefs about computations, which don’t form a time sequence in the ordinary sense.

Computations are often represented as trees: e.g. syntax trees of lambda calculus, or any other functional programming language, or expressions defining recursive functions. Hence, the obvious guess is using tree automata. Specifically, we will focus on bottom-up tree automata which are more powerful and more convenient for other reasons as well.

One limitation of this is that hypotheses described using (infra-)probabilistic tree automata are missing correlations between computations. Even if two programs contain a literally identical subroutine, or a single program uses multiple copies of an identical subroutine, the uncertainty is about the different copies is uncorrelated.

Another limitation is that sophisticated automata might require exponentially many states, meaning that a straightforward (tabular) description of the transition rules is exponentially large. Instead, we would like a convenient compositional language for constructing such automata (analogously[2] to the compositional language I proposed for MDPs).

Therefore, a generalization is called for.

Deterministic Operadic Automata

Bottom-up deterministic tree automata are algebras over the operad of trees[3] (modulo accepting states, which are not encoded in the algebra structure). More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton, where the arity of each symbol in the alphabet becomes the operadic arity of the corresponding generator. The transition rules of the tree automaton are the operations in the algebra.

So, we can now define a deterministic automaton over a (colored) operad to be an -algebra. The automaton is “finite” when the colors are a finite set, and the algebra elements of each color are a finite set. An example application is: take the operad of lambda terms, quotient by -reduction and -conversion, and add the renaming of free variables as additional operations into the operad. This leads to hypotheses-about-lambda-expressions which have a better inherent understanding of the semantics. On the other hand, the finiteness condition seems too restrictive now. But, maybe we can get away with, finite number of orbits under the action of the variable-renaming group. More generally, it is not clear which operads lead to a computationally-convenient notion of automaton (similarly how many problems are computationally/​statistically tractable for ordinary automata): this seems like a good problem to investigate.

Deterministic Operadic Transducers

The automata-theoretic notion of “transducer” also has a natural operadic analogue. However, in full generality, this analogue turns out to be much more powerful than classical transducers, even if the underlying operad is just the operad of the free monoid over some alphabet (i.e. in the case where operadic automata are just ordinary automata)[4]: ordinary transducers can only add symbols to the end of the output as they process new input symbols, whereas operadic transducers can e.g. add symbols to the beginning, or write into two “working tapes” s.t. the output is their concatenation.

Given an operad , we define the associated “state operad”[5] as follows. Define “-signature” to be a pair of (i) a list of -colors (the input types) and (ii) an -color (the output type).

  • A color of is a set (the “state space”), and for each , a finite tuple of -signatures (the “variables” defined at the given state).

  • An operation of is a mapping (where are the the input colors and is the output color) and, for each and , an operation of signature in the operad freely generated by and generators corresponding to the components of for all .

Given operads and respective colors, a determinsitic operadic transducer from to can now be defined as an operad morphism from to s.t. for any , the tuple starts with a -ary signature whose output color is . Given such , we can transform any -ary operation with output color to a -ary operation with output color (exercise for the reader).

It is also possible to make transducers process morphisms of general signature. We leave out the details, since the notation in the operad setting is unwieldy. We do explain it for cartesian transducers below.

Deterministic Cartesian Automata/​Transducers

It is often convenient to represent programs as directed acyclic graphs (DAGs) rather than trees. Indeed, we want to be able to define a term once and use it multiple types later. Hence, it is natural to look for a flavor of automaton that can process DAGs. This is achieved by replacing operads with cartesian monoidal categories and algebras by strong monoidal functors from the category to . DAGs are thought of as morphisms in a free cartesian category, just like trees are operations in a free operad.

Given a cartesian category , we can define the “state cartesian category” as follows. Here, a “”-signature” is just a pair of objects (source and target).

  • An object of is a set (the “state space”), and for each , a finite tuple of -signatures (the “variables” defined at the given state).

  • A morphism from to in is a mapping and, for each and , a morphism of signature in the cartesian category freely generated by and generators corresponding to the components of .

Given objects of , we have and is the concatenation of and .

A determinsitic cartesian transducer is now defined as consisting of:

  • Input cartesian category and output cartesian category

  • Primary input -signature

  • Auxiliary input -signatures

  • Input state set

  • Output -signatures

  • Output state set

  • A cartesian functor from to s.t. (i) and for any , is a prefix of (ii) and for any , is a prefix of

A transducer can accept an element of and morphisms of signatures and output an element of and morphisms of signatures .

Compositional language

We can now start defining our compositional language for constructing automata. This language describes[6] automata/​transducers as string diagrams. There are two types of edges: edges labeled with sets, and edges labeled with a cartesian category and some -signature. Vertices are labeled by transducers. Each vertex has incoming edges with labels corresponding to the transducer’s input state space and input signatures, and outgoing edges with labels corresponding to the transducer’s out state space and output signatures. The edges should be thought of as “wires” that conduct set elements or morphisms of prescribed signature respectively.

There is an interesting phenomenon here: string diagrams are in themselves morphisms of some cartesian category (with signature corresponding to labels of the diagram’s external edges)! This allows us to get self-referential, and add a special type of vertex in our diagram, which has one input which is a diagram of the same type (with prescribed external edge labels) and some number of other inputs, and whose semantics is applying the diagram to the inputs (where the outputs and other inputs have labels matching the diagram type).

I think that without further constraints, the resulting language might be Turing complete: at the very least, it is possible to create an infinite loop by coming up with an input and a diagram s.t. contains a transducer which transforms into , and runs its input through and runs another copy of the input through the diagram outputs. [EDIT: Actually, this requires to produce a diagram which contains . Unless we allow this explicitly, it shouldn’t be possible with “plain” cartesian automata. However, I think it is possible with closed cartesian automata (see below and comment).]

To avoid infinite loops, we can add a counter for the “level of meta” (i.e. the meta-vertex in diagrams of level can only accept diagrams of level lower than ). It is also possible to make this counter an ordinal, by representing ordinals in some notation that itself naturally takes the form of trees/​DAGs. Although to get a tractable language, we need to impose some strong constraints: even without the meta-vertex, we can get super-exponential blowup because each transducer can increase the size of the content of a variable by a constant factor for every input symbol. Possibly we should require our category to be equipped with some way to measure the “size” of morphisms (i.e. a functor to , or maybe a lax functor to regarded as a 2-category using its order), and most transducers to e.g. only increase size by .

Monadic Cartesian Automata (MCA)

So far, everything was determinsitic. But, we want our hypotheses to have probabilistic and maybe also “infra-probabilistic” uncertainty. The obvious way to define probabilistic automata is to replace cartesian categories with Markov categories and replace with e.g. (the category of measurable spaces with Markov kernels as morphisms). However, this is poorly suited for our purpose, because (i) we a joint distribution over the outputs of all programs, and (ii) if an identical subroutine is defined multiple times, we want our uncertainty about those copies to be strongly correlated. Intuitively, want we want is allow all automata read from an additional “tape” filled with random bits. This can be naturally expressed using monads.

Fix a monad [7] that is equipped with “lax cartesian structure”[5:1]. By this we mean that it has lax monoidal structure w.r.t. the monoidal structure of given by products, and this structure satisfies that for any sets and mappings and , we have

Here, stands for the diagonal mapping.

Also, we require that is a strength of , where is the monad unit. However, we require neither that arises from this strength nor that it makes commutative (typically neither holds!) We call such a monad “memo monad”.

The prototypical example is when is the set of binary trees whose leaves are labeled by elements of [8]. We think of such a tree as a way to select an element of by reading a stream of bits. Monad multiplication works by attaching the label-trees to the “parent” tree at the corresponding leaves. The monoidal structure works by taking the union of two trees, where each tree is regarded as the subset of corresponding to the bitstrings that describe the paths from the root to each vertex (imagine placing one tree such that it overlaps the other), and propagating the labels to the leaves in the obvious way.

Another example is binary trees whose non-leaf nodes are labeled by symbols from some fixed alphabet . We think of each element of as corresponding to its own stream can be be queried indepedently. We won’t spell out the monoidal structure, but imagine running two subroutines one after the other while memoizing all the queries the first subroutine made from the benefit of the second. As opposed to the first example, this monoidal structure is not symmetric.

We also need to be equipped with a strong monad morphism to the monad defined as follows. For any set ,

Here, is the set and is the distribution monad.

We think of these as distributions over a list of outcomes of size , with each outcome labeled by an element of . Monad multiplication works by combining concatenation of strings with multiplication of probabilities in the natural way. Importantly, there are canonical monad morphisms from to both the free monoid monad and the distribution monad .

The morphism from to means that automatically acquires monad morphisms to (we will denote it ) and (we will denote it ). In the case of binary trees, this morphism is defined by listing the leaves of the tree in horizontal order and assigning probabilities by postulating a fair coin at each non-leaf vertex.

To define automata, we require that our cartesian categories are equipped with a memo monad. Given a cartesian category with memo monad , a -automaton is a strong monoidal functor together with a natural isomorhism between and which is (i) strong monoidal (ii) makes a morphism in the 2-category of monads on arbitrary categories (we call this a “memo functor”). Given objects in and a morphism, is a Markov kernel from (the initial state of the automaton) to (the final state of the automaton).

Notice that, given and , has the marginals , but the corresponding random variables are not independent. (Because of how interacts with the monoidal structure of .) This means we get a non-trivial joint distribution over all morphisms, which is why we introduced monads in the first place.

This formalism has an infra-Bayesian analogue as well: it requires replacing by a the monad (and we know it is a monad) and replacing by the corresponding infra-Bayesian monad. We won’t spell out the details.

Monadic cartesian transducers can be defined, but we won’t spell out the definition. The definition is spelled out for the case of closed cartesian transducers (see below), where the notation is a little simpler.

Processing DAGs with MCAs

As we discussed, programs are DAGs therefore DAGs are the starting point of any chain of transducers we might want to build. DAGs can be naturall regarded as morphisms in a free cartesian category. But how to embed them into a free cartesian category with memo monad? We do it as follows:

  • Assume any “generator” vertex has one outgoing edge (and any number of incoming edge). Such a vertex corresponds to a generator morphism with signature , where the and are generator objects and is the memo monad.

  • A generator morphism can be transformed to a morphism as follows. To get , we compose (i) the monoidal structure for (to get from to (ii) (to get from to ) (iii) the monad multiplication of (to get from to ).

  • The morphisms of form can be vertically and horizontally composed with each other in the usual way to evaluate the DAG. In particular, every edge corresponds to for some generator object .

  • Any external incoming edge labeled by a generator objected is interpreted as the monad unit on in order to get from to .

It would be nice to formulate this process as evaluating a string diagram in the corresponding Kleisli category, but for a memo monad the Kleisli category is not even monoidal. Maybe there is some way to make it work, but atm I don’t know how.

(Monadic) Closed Cartesian Automata/​Transducers

It is convenient to require that our cartesian categories are closed, since it allows simplifying the definition of the “state category” 2-functor and improve its properties. We will also require the category to have arbitrary (not just finite) products. Indeed, let be such a category. Then we define the as follows:

  • An object of is a set (the “state space”), and for each , an object of (the “variables” defined at the given state).

  • A morphism in is a mapping and, for each , a morphism .

Given objects of , we have

Moreover,

So far, we could avoid the infinite products by requiring that is finite. But, when is equipped with a memo monad , we also want to have a memo monad . This will require infinite sets. We define as follows:

is indeed a monad, but to define the memo (monoidal) structure on it, we need additional structure on . Let and be two sets and consider the following two mappings :

In general, this diagram doesn’t commute, i.e. . But, we require that it “lax commutes” in the following sense.

For any set , can be regarded as a category, in which a morphism from to is a mapping s.t. the following conditions hold:

  • For any , .

This allows us to regard and as functors, and we postulate a natural transformation . We also postulate another natural transformation related in the same way to (it is separate since is not symmetric). Since and are actually natural transformations in terms of their dependence on , we require that and are 2-natural transformations. Possibly there are more conditions we need to satisfy but we won’t try to specify them precisely. In the case of trees/​queries, are defined using the observation that, knowing the result of making more queries is sufficient to predict the result of making less queries.

We can now define the monoidal structure on given the monoidal structure on . Consider some objects in . We define

As before, a (monadic) closed cartesian automaton is a (memo) cartesian closed functor from to . A (monadic) closed cartesian transducer from to is a (memo) cartesian closed functor from to . Given an object of , a transducer can be regarded as taking morphisms as inputs producing and as outputs, where:

Notice that these automata are in general not “finite” in any sense. To have a notion of “finite” automaton/​transducer, we need to have a fully faithfull cartesian closed subcategory of representing “finite” objects, which (in some sense?) generates all of if arbitrary products are allowed. We can then define to consist of s.t. is a finite set and for every , is in . A finite automaton is a functor s.t. is a finite set whenever is a finite object, and a finite transducer is a functor s.t. is a finite object whenever is. We should probably also require that, if an infinite product is a finite object, then all of are finite objects and all but a finite number of them are isomorphic to the terminal object . Notice that the later condition is consistent with the way we defined .

Curiously, the state category 2-functor seems to be a “commutative”[9] 2-monad. Indeed, the 2-monad multiplication corresponds to the functor defined as follows. Given an object of :

The 2-monoidal structure corresponds to the functor defined as follows. Given and object of and an object of :

This means that transducers are 1-morphisms in the corresponding Kleisli 2-category. In particular, composition of two transducers is a transducer. Moreover, an automaton is just a transducer from to the terminal closed cartesian category. And, transducer string diagrams are essentially just string diagrams in the Kleisli 2-category.


  1. ↩︎

    We can use the same methods to talk about arbitrary describable facts, but I’m content with computable.

  2. ↩︎

    The analogy is very rough, it is possible that there is some stronger compositional language that contains both as segments.

  3. ↩︎

    This is obvious, but I haven’t seen it in the literature.

  4. ↩︎

    I don’t know whether even this special case is already known under some other name in automata theory.

  5. ↩︎↩︎

    I don’t know whether there is a standard name for this in category theory.

  6. ↩︎

    In full generality, I’m not sure the composition of transducers yields transducers (however, it certainly does for closed cartesian automata, see below), so this language might define a broader class of “automata” rather than just describing them.

  7. ↩︎

    Instead of , we can use a different cartesian category for the state spaces of our automata, e.g. compact Polish spaces. But, we will continue the explanation using , for the sake of simplicity.

  8. ↩︎

    Equivalently, is the free magma on . In fact, algebras over are precisely magmas.

  9. ↩︎

    Not sure what is the right term for 2-monads.