Compositional language for hypotheses about computations

Vanessa KosoyMar 11, 2023, 7:43 PM

LW: 38 AF: 17

Infra-Bayesianism Logical Uncertainty Agent Foundations AI

TLDR: I sketch a language for describing (either Bayesian or infra-Bayesian) hypotheses about computations (i.e. computable^[1] logical/analytic facts). This is achieved via a novel (AFAIK) branch of automata theory, which generalizes both ordinary and tree automata by recasting them in category-theoretic language. The resulting theory can be equipped with a intriguing self-referential feature: automata that “build” and run new automata in “runtime”.

Epistemic status: this is an “extended shortform” i.e. the explanations are terse, the math is a sketch (in particular the proofs are not spelled out) and there might be errors (but probably not fatal errors). I made it a top-level post because it seemed worth reporting and was too long for a shortform.

Motivation

Finite-state automata are a convenient tool for constructing hypotheses classes that admit efficient learning. A classical example is MDPs in reinforcement learning. Ordinary automata accept strings in input. This is convenient when the input is a time sequence. However, in Turing Reinforcement Learning and infra-Bayesian physicalism we want to equip our agent with beliefs about computations, which don’t form a time sequence in the ordinary sense.

Computations are often represented as trees: e.g. syntax trees of lambda calculus, or any other functional programming language, or expressions defining recursive functions. Hence, the obvious guess is using tree automata. Specifically, we will focus on bottom-up tree automata which are more powerful and more convenient for other reasons as well.

One limitation of this is that hypotheses described using (infra-)probabilistic tree automata are missing correlations between computations. Even if two programs contain a literally identical subroutine, or a single program uses multiple copies of an identical subroutine, the uncertainty is about the different copies is uncorrelated.

Another limitation is that sophisticated automata might require exponentially many states, meaning that a straightforward (tabular) description of the transition rules is exponentially large. Instead, we would like a convenient compositional language for constructing such automata (analogously^[2] to the compositional language I proposed for MDPs).

Therefore, a generalization is called for.

Deterministic Operadic Automata

Bottom-up deterministic tree automata are algebras over the operad of trees^[3] (modulo accepting states, which are not encoded in the algebra structure). More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton, where the arity of each symbol in the alphabet becomes the operadic arity of the corresponding generator. The transition rules of the tree automaton are the operations in the algebra.

So, we can now define a deterministic automaton over a (colored) operad $O$ to be an $O$ -algebra. The automaton is “finite” when the colors are a finite set, and the algebra elements of each color are a finite set. An example application is: take the operad of lambda terms, quotient by $η$ -reduction and $α$ -conversion, and add the renaming of free variables as additional operations into the operad. This leads to hypotheses-about-lambda-expressions which have a better inherent understanding of the semantics. On the other hand, the finiteness condition seems too restrictive now. But, maybe we can get away with, finite number of orbits under the action of the variable-renaming group. More generally, it is not clear which operads lead to a computationally-convenient notion of automaton (similarly how many problems are computationally/statistically tractable for ordinary automata): this seems like a good problem to investigate.

Deterministic Operadic Transducers

The automata-theoretic notion of “transducer” also has a natural operadic analogue. However, in full generality, this analogue turns out to be much more powerful than classical transducers, even if the underlying operad is just the operad of the free monoid over some alphabet (i.e. in the case where operadic automata are just ordinary automata)^[4]: ordinary transducers can only add symbols to the end of the output as they process new input symbols, whereas operadic transducers can e.g. add symbols to the beginning, or write into two “working tapes” s.t. the output is their concatenation.

Given an operad $O$ , we define the associated “state operad”^[5] $O^{S}$ as follows. Define “ $O$ -signature” to be a pair of (i) a list of $O$ -colors (the input types) and (ii) an $O$ -color (the output type).

A color $A$ of $O^{S}$ is a set $S (A)$ (the “state space”), and for each $x \in S (A)$ , a finite tuple $V (x)$ of $O$ -signatures (the “variables” defined at the given state).
An operation of $O^{S}$ is a mapping $f : \prod_{i} S (A_{i}) \to S (B)$ (where $A_{i}$ are the the input colors and $B$ is the output color) and, for each $x \in \prod_{i} S (A_{i})$ and $σ \in V (f (x))$ , an operation of signature $σ$ in the operad freely generated by $O$ and generators corresponding to the components of $V (x_{i})$ for all $i$ .

Given operads $O_{1, 2}$ and $A, B$ respective colors, a determinsitic operadic transducer from $A$ to $B$ can now be defined as an operad morphism $F$ from $O_{1}$ to $O_{2}^{S}$ s.t. for any $x \in S (F (A))$ , the tuple $V (x)$ starts with a $0$ -ary signature whose output color is $B$ . Given such $F$ , we can transform any $0$ -ary operation with output color $A$ to a $0$ -ary operation with output color $B$ (exercise for the reader).

It is also possible to make transducers process morphisms of general signature. We leave out the details, since the notation in the operad setting is unwieldy. We do explain it for cartesian transducers below.

Deterministic Cartesian Automata/Transducers

It is often convenient to represent programs as directed acyclic graphs (DAGs) rather than trees. Indeed, we want to be able to define a term once and use it multiple types later. Hence, it is natural to look for a flavor of automaton that can process DAGs. This is achieved by replacing operads with cartesian monoidal categories and algebras by strong monoidal functors from the category to $S e t$ . DAGs are thought of as morphisms in a free cartesian category, just like trees are operations in a free operad.

Given a cartesian category $C$ , we can define the “state cartesian category” $C^{S}$ as follows. Here, a “ $C$ ”-signature” is just a pair of objects (source and target).

An object $A$ of $C^{S}$ is a set $S (A)$ (the “state space”), and for each $x \in S (A)$ , a finite tuple $V (x)$ of $C$ -signatures (the “variables” defined at the given state).
A morphism from $A$ to $B$ in $C^{S}$ is a mapping $f : S (A) \to S (B)$ and, for each $x \in S (A)$ and $σ \in V (f (x))$ , a morphism of signature $σ$ in the cartesian category freely generated by $C$ and generators corresponding to the components of $V (x)$ .

Given $A, B$ objects of $C^{S}$ , we have $S (A \times B) = S (A) \times S (B)$ and $V (x, y)$ is the concatenation of $V (x)$ and $V (y)$ .

A determinsitic cartesian transducer is now defined as consisting of:

Input cartesian category $C$ and output cartesian category $D$
Primary input $C$ -signature $σ_{0} = A \to B$
Auxiliary input $D$ -signatures $σ_{1} \dots σ_{n}$
Input state set $X$
Output $D$ -signatures $τ_{1} \dots τ_{m}$
Output state set $Y$
A cartesian functor $F$ from $C$ to $D^{S}$ s.t. (i) $S (F (A)) = X$ and for any $x \in X$ , $V (x)$ is a prefix of $σ_{1} \dots σ_{n}$ (ii) $S (F (B)) = Y$ and for any $y \in Y$ , $τ_{1} \dots τ_{m}$ is a prefix of $V (y)$

A transducer can accept an element of $X$ and morphisms of signatures $σ_{0} \dots σ_{n}$ and output an element of $Y$ and morphisms of signatures $τ_{1} \dots τ_{m}$ .

Compositional language

We can now start defining our compositional language for constructing automata. This language describes^[6] automata/transducers as string diagrams. There are two types of edges: edges labeled with sets, and edges labeled with a cartesian category $C$ and some $C$ -signature. Vertices are labeled by transducers. Each vertex has incoming edges with labels corresponding to the transducer’s input state space and input signatures, and outgoing edges with labels corresponding to the transducer’s out state space and output signatures. The edges should be thought of as “wires” that conduct set elements or morphisms of prescribed signature respectively.

There is an interesting phenomenon here: string diagrams are in themselves morphisms of some cartesian category (with signature corresponding to labels of the diagram’s external edges)! This allows us to get self-referential, and add a special type of vertex in our diagram, which has one input which is a diagram of the same type (with prescribed external edge labels) and some number of other inputs, and whose semantics is applying the diagram to the inputs (where the outputs and other inputs have labels matching the diagram type).

I think that without further constraints, the resulting language might be Turing complete: at the very least, it is possible to create an infinite loop by coming up with an input $α$ and a diagram $D$ s.t. $D$ contains a transducer $T$ which transforms $α$ into $D$ , and $D$ runs its input through $T$ and runs another copy of the input through the diagram $T$ outputs. [EDIT: Actually, this requires $T$ to produce a diagram $D$ which contains $T$ . Unless we allow this explicitly, it shouldn’t be possible with “plain” cartesian automata. However, I think it is possible with closed cartesian automata (see below and comment).]

To avoid infinite loops, we can add a counter for the “level of meta” (i.e. the meta-vertex in diagrams of level $k$ can only accept diagrams of level lower than $k$ ). It is also possible to make this counter an ordinal, by representing ordinals in some notation that itself naturally takes the form of trees/DAGs. Although to get a tractable language, we need to impose some strong constraints: even without the meta-vertex, we can get super-exponential blowup because each transducer can increase the size of the content of a variable by a constant factor for every input symbol. Possibly we should require our category to be equipped with some way to measure the “size” of morphisms (i.e. a functor to $N$ , or maybe a lax functor to $N$ regarded as a 2-category using its order), and most transducers to e.g. only increase size by $O (1)$ .

Monadic Cartesian Automata (MCA)

So far, everything was determinsitic. But, we want our hypotheses to have probabilistic and maybe also “infra-probabilistic” uncertainty. The obvious way to define probabilistic automata is to replace cartesian categories with Markov categories and replace $S e t$ with e.g. $S t o c h$ (the category of measurable spaces with Markov kernels as morphisms). However, this is poorly suited for our purpose, because (i) we a joint distribution over the outputs of all programs, and (ii) if an identical subroutine is defined multiple times, we want our uncertainty about those copies to be strongly correlated. Intuitively, want we want is allow all automata read from an additional “tape” filled with random bits. This can be naturally expressed using monads.

Fix a monad $M_{0} : S e t \to S e t$ ^[7] that is equipped with “lax cartesian structure”^[5:1]. By this we mean that it has lax monoidal structure $μ^{0}$ w.r.t. the monoidal structure of $S e t$ given by products, and this structure satisfies that for any sets $X, Y, Z$ and mappings $f : X \to Y$ and $g : X \to Z$ , we have

$M_{0} ((f \times g) \circ Δ_{X}) = μ_{Y, Z}^{0} \circ (M_{0} (f) \times M_{0} (g)) \circ Δ_{M_{0} (X)}$

Here, $Δ$ stands for the diagonal mapping.

Also, we require that $μ_{X, Y}^{0} \circ (η_{X} \times {i d}_{M_{0} (Y)})$ is a strength of $M_{0}$ , where $η$ is the monad unit. However, we require neither that $μ^{0}$ arises from this strength nor that it makes $M_{0}$ commutative (typically neither holds!) We call such a monad “memo monad”.

The prototypical example is when $M_{0} (X)$ is the set of binary trees whose leaves are labeled by elements of $X$ ^[8]. We think of such a tree as a way to select an element of $X$ by reading a stream of bits. Monad multiplication works by attaching the label-trees to the “parent” tree at the corresponding leaves. The monoidal structure works by taking the union of two trees, where each tree is regarded as the subset of ${0, 1}^{*}$ corresponding to the bitstrings that describe the paths from the root to each vertex (imagine placing one tree such that it overlaps the other), and propagating the labels to the leaves in the obvious way.

Another example is binary trees whose non-leaf nodes are labeled by symbols from some fixed alphabet $Σ$ . We think of each element of $Σ$ as corresponding to its own stream can be be queried indepedently. We won’t spell out the monoidal structure, but imagine running two subroutines one after the other while memoizing all the queries the first subroutine made from the benefit of the second. As opposed to the first example, this monoidal structure is not symmetric.

We also need $M_{0}$ to be equipped with a strong monad morphism to the monad $P F M$ defined as follows. For any set $X$ ,

$P F M (X) := \infty ⨆ n = 1 (X^{n} \times D ([n]))$

Here, $[n]$ is the set ${0, 1 \dots n - 1}$ and $D$ is the distribution monad.

We think of these as distributions over a list of outcomes of size $n$ , with each outcome labeled by an element of $X$ . Monad multiplication works by combining concatenation of strings with multiplication of probabilities in the natural way. Importantly, there are canonical monad morphisms from $P F M$ to both the free monoid monad $F M$ and the distribution monad $D$ .

The morphism from $M_{0}$ to $P F M$ means that $M_{0}$ automatically acquires monad morphisms to $F M$ (we will denote it $m_{F M}$ ) and $D$ (we will denote it $m_{D}$ ). In the case of binary trees, this morphism is defined by listing the leaves of the tree in horizontal order and assigning probabilities by postulating a fair coin at each non-leaf vertex.

To define automata, we require that our cartesian categories are equipped with a memo monad. Given a cartesian category $C$ with memo monad $M$ , a $(C, M)$ -automaton is a strong monoidal functor $F : C \to S e t$ together with a natural isomorhism between $M_{0} \circ F$ and $F \circ M$ which is (i) strong monoidal (ii) makes $F$ a morphism in the 2-category of monads on arbitrary categories (we call this a “memo functor”). Given $A, B$ objects in $C$ and $α : A \to M (B)$ a morphism, $m_{D} (F (α))$ is a Markov kernel from $F (A)$ (the initial state of the automaton) to $F (B)$ (the final state of the automaton).

Notice that, given $α_{1} : A_{1} \to B_{1}$ and $α_{2} : A_{2} \to B_{2}$ , $m_{D} (F (α_{1} \times α_{2}))$ has the marginals $m_{D} (F (α_{1} \times α_{i}))$ , but the corresponding random variables are not independent. (Because of how $m_{D}$ interacts with the monoidal structure of $M_{0}$ .) This means we get a non-trivial joint distribution over all morphisms, which is why we introduced monads in the first place.

This formalism has an infra-Bayesian analogue as well: it requires replacing $D$ by a the $□$ monad (and we know it is a monad) and replacing $P F M$ by the corresponding infra-Bayesian monad. We won’t spell out the details.

Monadic cartesian transducers can be defined, but we won’t spell out the definition. The definition is spelled out for the case of closed cartesian transducers (see below), where the notation is a little simpler.

Processing DAGs with MCAs

As we discussed, programs are DAGs therefore DAGs are the starting point of any chain of transducers we might want to build. DAGs can be naturall regarded as morphisms in a free cartesian category. But how to embed them into a free cartesian category with memo monad? We do it as follows:

Assume any “generator” vertex has one outgoing edge (and any number of incoming edge). Such a vertex corresponds to a generator morphism with signature $\prod_{i} A_{i} \to M (B)$ , where the $A_{i}$ and $B$ are generator objects and $M$ is the memo monad.
A generator morphism $α : \prod_{i} A_{i} \to M (B)$ can be transformed to a morphism $α^{K} : \prod_{i} M (A_{i}) \to M (B)$ as follows. To get $α^{K}$ , we compose (i) the monoidal structure for $M$ (to get from $\prod_{i} M (A_{i})$ to $M (\prod_{i} A_{i})$ (ii) $M (α)$ (to get from $M (\prod_{i} A_{i})$ to $M^{2} (B)$ ) (iii) the monad multiplication of $M$ (to get from $M^{2} (B)$ to $M (B)$ ).
The morphisms of form $α^{K}$ can be vertically and horizontally composed with each other in the usual way to evaluate the DAG. In particular, every edge corresponds to $M (A)$ for some generator object $A$ .
Any external incoming edge labeled by a generator objected $A$ is interpreted as the monad unit on $A$ in order to get from $A$ to $M (A)$ .

It would be nice to formulate this process as evaluating a string diagram in the corresponding Kleisli category, but for a memo monad the Kleisli category is not even monoidal. Maybe there is some way to make it work, but atm I don’t know how.

(Monadic) Closed Cartesian Automata/Transducers

[EDIT: This section contains an error, pointed out to me by David Jaz Myers. I don’t currently have a working construction of transducers as 1-morphisms in a Kleisli 2-category. It’s probably repairable by changing the setting somehow.]

It is convenient to require that our cartesian categories are closed, since it allows simplifying the definition of the “state category” 2-functor and improve its properties. We will also require the category to have arbitrary (not just finite) products. Indeed, let $C$ be such a category. Then we define the $C^{S}$ as follows:

An object $A$ of $C^{S}$ is a set $S (A)$ (the “state space”), and for each $x \in S (A)$ , an object $V_{A} (x)$ of $C$ (the “variables” defined at the given state).
A morphism $α : A \to B$ in $C^{S}$ is a mapping $S (α) : S (A) \to S (B)$ and, for each $x \in S (A)$ , a morphism $V_{α} (x) : V_{A} (x) \to V_{B} (S (α, x))$ .

Given $A, B$ objects of $C^{S}$ , we have $S (A \times B) = S (A) \times S (B)$ $V_{A \times B} (x, y) = V_{A} (x) \times V_{B} (y)$

Moreover, $S (B^{A}) = S (B)^{S (A)}$ $V_{B^{A}} (f) = \prod x \in S (A) V_{B} (f (x))^{V_{A} (x)}$

So far, we could avoid the infinite products by requiring that $S (A)$ is finite. But, when $C$ is equipped with a memo monad $M$ , we also want $C^{S}$ to have a memo monad $M^{S}$ . This will require infinite sets. We define $M^{S}$ as follows:

$S (M^{S} (A)) := M_{0} (S (A))$ $V_{M^{S} (A)} (x) := \prod i \in [| m_{F M} (S (A)) |] V_{A} (m_{F M} (S (A))_{i})$

$M^{S}$ is indeed a monad, but to define the memo (monoidal) structure on it, we need additional structure on $m_{P F M}$ . Let $X$ and $Y$ be two sets and consider the following two mappings $f_{1, 2} : M_{0} (X) \times M_{0} (Y) \to P F M (X)$ :

$f_{1} := m_{P F M} (X) \circ {p r}_{M_{0} (X)}$ $f_{2} := m_{P F M} (X) \circ M_{0} ({p r}_{X}) \circ μ_{X, Y}^{0}$

In general, this diagram doesn’t commute, i.e. $f_{1} \neq f_{2}$ . But, we require that it “lax commutes” in the following sense.

For any set $Z$ , $P F M (Z)$ can be regarded as a category, in which a morphism from $(u, p) \in Z^{n} \times D [n]$ to $(v, q) \in Z^{m} \times D [m]$ is a mapping $α : [n] \to [m]$ s.t. the following conditions hold:

For any $i \in [n]$ , $u_{i} = v_{α (i)}$ .
$α_{*} (p) = q$

This allows us to regard $f_{1}$ and $f_{2}$ as functors, and we postulate a natural transformation $ϕ_{X, Y} : f_{1} \to f_{2}$ . We also postulate another natural transformation $ϕ_{X, Y}^{'}$ related in the same way to ${p r}_{Y}$ (it is separate since $μ^{0}$ is not symmetric). Since $f_{1}$ and $f_{2}$ are actually natural transformations in terms of their dependence on $X, Y$ , we require that $ϕ$ and $ϕ^{'}$ are 2-natural transformations. Possibly there are more conditions we need to satisfy but we won’t try to specify them precisely. In the case of trees/queries, $ϕ, ϕ^{'}$ are defined using the observation that, knowing the result of making more queries is sufficient to predict the result of making less queries.

We can now define the monoidal structure $μ^{S}$ on $M^{S}$ given the monoidal structure $μ$ on $M$ . Consider some $A, B$ objects in $C^{S}$ . We define

$S (μ_{A, B}^{S}) := μ_{S (A), S (B)}^{0}$ $V_{μ_{A, B}^{S}} (x, y)_{i} := {i d}_{V_{A} (x)_{ϕ_{S (A), S (B)} (i)}} \times {i d}_{V_{B} (x)_{ϕ_{S (A), S (B)}^{'} (i)}}$

As before, a (monadic) closed cartesian automaton is a (memo) cartesian closed functor from $C$ to $S e t$ . A (monadic) closed cartesian transducer from $C$ to $D$ is a (memo) cartesian closed functor from $C$ to $D^{S}$ . Given an object $A$ of $C$ , a transducer $F$ can be regarded as taking morphisms $α : 1 \to A$ as inputs producing $y$ and $β$ as outputs, where:

$y := S (F (α))$
$β := V_{F (α)}$

Notice that these automata are in general not “finite” in any sense. To have a notion of “finite” automaton/transducer, we need to have a fully faithfull cartesian closed subcategory $C_{f i n}$ of $C$ representing “finite” objects, which (in some sense?) generates all of $C$ if arbitrary products are allowed. We can then define $C_{f i n}^{S}$ to consist of $A$ s.t. $S (A)$ is a finite set and for every $x \in S (A)$ , $V_{A} (x)$ is in $C_{f i n}$ . A finite automaton is a functor $F$ s.t. $F (A)$ is a finite set whenever $A$ is a finite object, and a finite transducer is a functor $F$ s.t. $F (A)$ is a finite object whenever $A$ is. We should probably also require that, if an infinite product $\prod_{β} A_{β}$ is a finite object, then all of $A_{β}$ are finite objects and all but a finite number of them are isomorphic to the terminal object $1$ . Notice that the later condition is consistent with the way we defined $C_{f i n}^{S}$ .

Curiously, the state category 2-functor seems to be a “commutative”^[9] 2-monad. Indeed, the 2-monad multiplication corresponds to the functor $P : (C^{S})^{S} \to C^{S}$ defined as follows. Given $A$ an object of $(C^{S})^{S}$ :

$S (P (A)) := ⊔_{x \in S (A)} S (V_{A} (x))$
$V_{P (A)} (x, y) := V_{V_{A} (x)} (y)$

The 2-monoidal structure corresponds to the functor $ν : C^{S} \times D^{S} \to (C \times D)^{S}$ defined as follows. Given $A$ and object of $C^{S}$ and $B$ an object of $D$ :

$S (ν (A, B)) := S (A) \times S (B)$
$V_{ν (A, B)} (x, y) := V_{A} (x) \times V_{B} (y)$

This means that transducers are 1-morphisms in the corresponding Kleisli 2-category. In particular, composition of two transducers is a transducer. Moreover, an automaton is just a transducer from $C$ to the terminal closed cartesian category. And, transducer string diagrams are essentially just string diagrams in the Kleisli 2-category.

↩︎
We can use the same methods to talk about arbitrary describable facts, but I’m content with computable.
↩︎
The analogy is very rough, it is possible that there is some stronger compositional language that contains both as segments.
↩︎
This is obvious, but I haven’t seen it in the literature.
↩︎
I don’t know whether even this special case is already known under some other name in automata theory.
↩︎↩︎
I don’t know whether there is a standard name for this in category theory.
↩︎
In full generality, I’m not sure the composition of transducers yields transducers (however, it certainly does for closed cartesian automata, see below), so this language might define a broader class of “automata” rather than just describing them.
↩︎
Instead of $S e t$ , we can use a different cartesian category for the state spaces of our automata, e.g. compact Polish spaces. But, we will continue the explanation using $S e t$ , for the sake of simplicity.
↩︎
Equivalently, $M_{0} (X)$ is the free magma on $X$ . In fact, algebras over $M_{0}$ are precisely magmas.
↩︎
Not sure what is the right term for 2-monads.

What links here?

Vanessa KosoyMar 11, 2023, 7:43 PM

LW: 38 AF: 17

6 comments12 min readLW link

Infra-Bayesianism Logical Uncertainty Agent Foundations AI

Gunnar_Zarncke Mar 12, 2023, 11:03 PM
LW: 5 AF: 4
0
AF
Can you provide some simple or not-so-simple example automata in that language?
- Vanessa Kosoy Mar 16, 2023, 11:36 AM
  LW: 7 AF: 5
  0
  AF Parent
  Good idea!
  
  Example 1
  
  Fix some alphabet $Σ$ . Here’s how you make an automaton that checks that the input sequence (an element of $Σ^{*}$ ) is a subsequence of some infinite periodic sequence with period $n$ . For every $k$ in $Z / n$ , let $A_{k}$ be an automaton that checks whether the symbols in the input sequences at places $i$ s.t. $i \equiv k (mod n)$ are all equal (its number of states is $O (n | Σ |)$ ). We can modify it to make a transducer $A_{k}^{'}$ that produces its unmodified input sequence if the test passes and $⊥$ if the test fails. It also produces $⊥$ when the input is $⊥$ . We then chain $A_{0}^{'}, A_{1}^{'} \dots A_{n - 1}^{'}$ and get the desired automaton. Alternatively, we can connect the $A_{k}$ in parallel and then add an automaton $B$ with $n$ boolean inputs that acts as an AND gate. $B$ is a valid multi-input automaton in our language because AND is associative and commutative (so we indeed get a functor on the product category).
  
  Notice that the description size of this automaton in our language is polynomial in $n$ . On the other hand, a tabular description would be exponential in $n$ (the full automaton has exponentially many states). Moreover, I think that any regular expression for this language is also exponentially large.
  
  Example 2
  
  We only talked about describing deterministic (or probabilistic, or monadic) automata. What about nondeterministic? Here is how you can implement a nondeterministic automaton in the same language, without incurring the exponential penalty of determinization, assuming non-free categories are allowed.
  
  Let $B$ be some category that contains an object $B$ and a morphism $b : B \to B$ s.t. $b \neq {i d}_{B}$ and $b^{2} = b$ . For example it can be the closed cartesian category freely generated by this datum (which I think is well-defined). Then, we can simulate a non-deterministic automaton $A$ on category $C$ by a deterministic transducer from $C$ to $B$ :
  - The state set is always the one element set (or, it can be two elements: “accept” and “reject”).
  - For every state of $A$ , we have a variable of signature $B \to B$ . This variable is intended to hold $b$ when the state is achievable with the current input and ${i d}_{B}$ otherwise.
  - We use the fact that composition of endomorophisms of $B$ behaves as an “OR” operator to implement the transition rules of $A$ .
  However, this trick doesn’t give us nondeterministic transducers. A completely different way to get nondeterminism, which works for transducers as well, is using infra-Bayesian Knightian uncertainty to represent it. We can do via the memo monad approach, but then we get the constraint that nondeterministic transitions happen identically in e.g. identical subgraphs. This doesn’t matter for ordinary automata (that process words) but it matters for tree/graph automata. If we don’t want this constraint, we can probably do it in a framework that uses “infra-Markov” categories (which I don’t know how to define atm, but it seems likely to be possible).
  
  Together with Example 1, this implies that (this version of) our language is strictly more powerful than regular expressions.
  
  Example 3
  
  Suppose we want to take two input sequences (elements of $Σ^{*}$ ) and check them for equality. This is actually impossible without the meta-vertex, because of the commutativity properties of category products. With the meta-vertex (the result is not an automaton anymore, we can call it a “string machine”), here is how we can do it. Denote the input sequences $x$ and $y$ . We apply a transducer to $x$ that transforms it into a string diagram. Each symbol $a$ in $x$ is replaced by a transducer that checks whether the first symbol of its input is $a$ and outputs the same sequence with the first symbol removed. These transducers are chained together. In the end of the chain we add a final transducer that checks whether its input is empty. Finally, we use the meta-vertex to feed $y$ into this chain of transducers.
  
  Notice that this requires only one level of meta: the string diagram we generate doesn’t contain any meta-vertices.
  
  Example 4
  
  More generally, we can simulate any multi-tape automaton using a succinct string machine with one level of meta: First, there is a transducer $T$ that takes the tapes as separate inputs and simulates a single step of the multi-tape automaton. Here, moving the tape head forward corresponds to outputting a sequence with the first symbol omitted. Second, there is a transducer $P$ that takes the inputs $x_{1} \dots x_{n}$ and produces a string diagram that consists of $\sum_{i} | x_{i} |$ copies of $T$ chained together (this transducer just adds another $T$ to the chain per every input symbol). Finally, the desired string machine runs $P$ and then uses a meta-vertex to apply the diagram $P$ produces to the input.
  
  Example 5
  
  A string machine with unbounded meta can simulate a 1D cellular automaton, and hence any Turing machine: First, there is a transducer $T$ which simulates a single step of the automaton (this is a classical transducer, not even the stronger version we allow). Second, we already know there is an automaton $E$ that tests for equality. Third, fix some automaton $A$ whose job is to read the desired output off the final state of the cellular automaton. We modify $E$ to make a transducer $E^{'}$ , which (i) when the inputs are equal, produces a string diagram that consists only of $A$ (ii) when the inputs are different, produces this string machine. This string macine applies $T$ to the input $x$ , then applies $E^{'}$ to $x$ and $T (x)$ , then uses the meta-vertex to apply $E^{'} (x, T (x))$ to $T (x)$ …
  
  Except that I cheated here because the description of $E^{'}$ references a string machine that contains $E^{'}$ . However, with closed machines such quining can be implemented: instead of producing a string diagram, the transducer produces an morphism that converts transducers to string diagrams, and then we feed the same transducer composed in the same way with itself into this morphism.
  What links here?
  - Compositional language for hypotheses about computations by Vanessa Kosoy (Mar 11, 2023, 7:43 PM; 38 points)
Chris van Merwijk Mar 23, 2025, 4:17 AM
LW: 1 AF: 1
0
AF
More precisely, they are algebras over the free operad generated by the input alphabet of the tree automaton
Wouldn’t this fail to preserve the arity of the input alphabet? i.e. you can have trees where a given symbol occurs multiple times, and with different amounts of children? That wouldn’t be allowed from the perspective of the tree automaton right?
- Vanessa Kosoy Mar 23, 2025, 2:30 PM
  LW: 2 AF: 2
  0
  AF Parent
  No? The elements of an operad have fixed arity. When defining a free operad you need to specify the arity of every generator.
  - Chris van Merwijk Mar 23, 2025, 4:00 PM
    LW: 1 AF: 1
    0
    AF Parent
    I may be confused somehow. Feel free to ignore. But:
    * At first I thought you meant the input alphabet to be the colors, not the operations.
    * Instead, am I correct that “the free operad generated by the input alphabet of the tree automaton” is an operad with just one color, and the “operations” are basically all the labeled trees where labels of the nodes are the elements of the alphabet, such that the number of children of a node is always equal to the arity of that label in the input alphabet?
    * That would make sense, as the algebra would then I guess assign the state space of the tree automaton to the single color of the operad, and each arity n operation would be mapped to the mathematical function from Q^n to Q.
    * That would make sense I think, but then why do you talk about a “colored” operad in: “we can now define a deterministic automaton over a (colored) operad $O$ to be an $O$ -algebra”?
    - Vanessa Kosoy Mar 24, 2025, 9:03 AM
      LW: 2 AF: 2
      0
      AF Parent
      You now understand correctly. The reason I switch to colored operads is to add even more generality. My key use case is when the operad consists of terms-with-holes in a programming language, in which case the colors are the types of the terms/holes.

Compositional language for hypotheses about computations

Motivation

Deterministic Operadic Automata

Deterministic Operadic Transducers

Deterministic Cartesian Automata/​Transducers

Compositional language

Monadic Cartesian Automata (MCA)

Processing DAGs with MCAs

(Monadic) Closed Cartesian Automata/​Transducers

Example 1

Example 2

Example 3

Example 4

Example 5

Deterministic Cartesian Automata/Transducers

(Monadic) Closed Cartesian Automata/Transducers