# Consider using reversible automata for alignment research

In recent years, there have been several cases of alignment researchers using Conway’s Game of Life as a research environment;

Introducing SafeLife: Safety Benchmarks for Reinforcement Learning (Wainwright, Eckersley 2019)

Agency in Conway’s Game of Life (Flint 2021)

Finite Factored Sets (Garrabrant 2021)

Optimization Concepts in the Game of Life (Krakovna, Kumar 2021)

Finding gliders in the game of life (Christiano 2022)

A quick aside in The Plan − 2022 Update (Wentworth 2022)

Conway’s Game of Life is by far the most popular and well-known cellular automaton. And for good reason; it’s immediately appealing and just begs to be played with. It is a great model context in which to research things like optimization and agency;

It’s deterministic, making experiments clean and replicable.

It’s discrete in both time and space, which is often easier to analyze and reason about.

The rules are intuitive and simple (unlike, say, the Standard Model).

The board can be finite

*or*infinite, which can both be important contexts to analyze.It has a concept of spacial dimensions and causal propagation, so the boards end up with object-like patterns, and a “speed of light”.

The system is Turing complete, which gives it the potential for profoundly complex and unpredictable behavior, despite the simple rules.

Any agents inside it are inherently “embedded”.

It’s really fun to play with! There are many website where you can click around and design your own Life animations. Lots of fascinating patterns have been found by people over the years. This can make it a little easier for a researcher to get through the day.

These properties do a good job at mimicking the real universe while being significantly more tractable.

But those who play with Life will notice some odd things about it that are *not* mimicked in the real world, especially if they’re familiar with physics;

There’s no obvious conversation laws, of either momentum or energy.

Objects often just disappear.

Structures are very brittle; most interactions between object-like things destroy both objects.

The rules are not reversible; two states can both evolve into the same state, meaning that you can’t “know” where you came from just by looking at the current state.

These properties are related, and specifically, I think that adopting a reversible rule gives you most of the other properties.

Because the rules are reversible, any evolution on a finite board must eventually loop back to its original state. This lets you prove interesting behavior relatively easily. For example, if two spaceships interact, at least some live pixels must eventually exit the region. If they didn’t, then that would imply that there was a finite board size around the interaction area which would not loop back to the original state. Note that this behavior about interactions carries over to infinite board sizes!

Some reversible rules attain robustness by having conservation of the number of live pixels. The structures won’t necessarily stay the same, but at least they won’t disappear entirely. (Here’s an example rule that does *not* conserve pixel count.)

Overall, I think it’s worth us doing more research with reversible rules because it appears that *our* universe has a reversible rule, at least to an extremely good approximation, and so this might lead to more relevant and accurate conclusions about how optimization and agency works in this universe.

I’ve seen three specific (interesting) reversible cellular automata so far, which are Critters, the billiard ball computer, and the Single Rotation Rule.

I don’t want to nerd snipe alignment researchers, but it might be worth playing around with some of these other automata to see if they are a worthwhile tool. Golly, one of the most developed and widely used cellular automata development platforms, has facilities for defining your own automata rules (and lists Critters on its rules help page). This site is also a very flexible automata simulator, specifically designed for reversible rules. For more discussion, I found this paper (by one of Critter’s creators) very accessible, and also enjoyed this blog post.

- EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′) by 20 Dec 2022 9:49 UTC; 27 points) (EA Forum;
- EA & LW Forums Weekly Summary (12th Dec − 18th Dec 22′) by 20 Dec 2022 9:49 UTC; 10 points) (
- 100 Dinners And A Workshop: Information Preservation And Goals by 28 Mar 2023 3:13 UTC; 8 points) (
- 16 Dec 2023 4:04 UTC; 3 points) 's comment on Current AIs Provide Nearly No Data Relevant to AGI Alignment by (

This is super interesting. I was wondering if you could give a few more thoughts/intuitions about why you think reversibility is important. I understand that it would make the simulations more physics like, but why is being physics like important to alignment research and/or agency research?

I clicked on the paper by the Critter creator, which seems like it might go deeper into that issue, but don’t have the time to read through it right now. Super exciting stuff! Thanks.

I’m (currently) mostly interested in it for the purpose of understanding optimization. If, for example, the world has a finite number of possible states, and the evolution rule is reversible, then no long-term optimization is possible, because all (accessible) states will be visited equally often. That scenario is relatively clear, and I’m trying to understand

exactlywhat happens under different constraints, and which kinds of optimization are possible.Not sure that I understand your claim here about optimization. An optimizer is presumably given some choice of possible initial states to choose from to achieve its goal (otherwise it cannot interact at all). In which case, the set of accessible states will depend upon the chosen initial state and so the optimizer can influence long term behavior and choose whatever best matches it’s desires.

I share your confusions/intution about what is meant by optimization here. But I think for the purposes of this post, optimization is defined here, which is linked to at the beginning of this post. In that link, optimization is thought of as a pattern that persists in the face of perturbations and that evolves towards a small set of states. I’m still not totally grokking it though.

Thanks. I think I’ve been tripped up by this terminology more than once now.

Another physics property that would be nice to have is relativity, as it allows objects to have different velocities despite otherwise being the same configuration. However relativity might be too difficult to have in practice, as it puts a lot of constraints on how the automaton should behave, and prevents it from being grid-based.

Alex Mennen mentioned this too. It would be super interesting if we could have a discrete state space that still obeyed special relativity! (I unfortunately never properly learned it.)

One difference is that special relativity is approximately not true at everyday speeds (or at least, it doesn’t need to be accounted for) whereas reversible laws of physics are fundamentally evident in everyday scenarios. So they somehow feel intuitively more of a useful constraint to me.

At everyday speeds, special relativity still holds, it just happens to also be closely approximated by a different kind of relativity that is sometimes called “Newtonian relativity”.

Stuff like relativity is fundamentally about symmetry. You want to say that if you have some trajectory τ which satisfies the laws of physics, and some symmetry σ (such as “have everything move in → direction at a speed of 5 m/s”), then στ must also satisfy the laws of physics.

Interestingly, I think you can actually have something resembling Newtonian relativity with a discrete state space, though in doing so you lose the lightspeed limit, which seems bad. More concretely, consider e.g. the trajectory space in cellular automata like Conway’s Game of Life or Critters. It is 2N2×N (writing it as N2×N rather than N3 to separate out the time axis).

Suppose we want to create a symmetry corresponding to a Newtonian boost by some vector v. That is, we need to create a symmetry σ:2N2×N→2N2×N such that if you input some trajectory τ, you get a resulting trajectory στ where everything moves in the direction of v. This can be defined by στ(q,t)=τ(q−tv,t).

I suspect that there are no interesting automata which satisfy this symmetry. However, one might be able to make some if one expanded from a cell space of 2 to a richer cell space, and also had the symmetry act on that space (for example, one could attach a velocity to each of the cells, and then have the symmetry also add to that velocity).

Still, I think giving up on having a finite speed of light is a pretty serious problem, so I think one would want special relativity rather than Newtonian relativity, and I am pretty sure that is incompatible with a discrete state space.

Ah, right. (I’ve heard this referred to as Galilean relativity.) That does seem like it qualifies as “fundamentally evident in everyday scenarios”.

Cellular automata don’t really even feel like they

havea concept of velocity, just a fixed rate of causal propagation. And when things “move” it’s just that they’re making changes in the grid next to them, and some patterns just so happen to do so in a way where, after a certain period, it’s the same pattern translated… is that what we think happens in our universe? Are electrons moving “just causal propagations”? Somehow this feels more natural for the Game of Life and less natural for physics.My claim would basically be that “velocity” as a concept only exists in relativistic systems. Because in a relativistic system, you can take any configuration and apply the symmetry to it to get a configuration that travels at a given velocity. Meanwhile in a non-relativistic system like Game of Life, only very special configurations have a recurring pattern, and they might be better thought of as spacetime crystals than as having an actual velocity.

(In fact, I think it might be useful to think of “velocity exists” as

thething that relativity is asserting, rather than “everything is relative” being the thing relativity is asserting.)This seems too strong. Can’t you write down a linear field theory with no (Galilean or Lorentzian) boost symmetry, but where waves still propagate at constant velocity? Just with a weird dispersion relation?

(Not confident in this, I haven’t actually tried it and have spent very little time thinking about systems without boost symmetry.)

You can probably come up with lots of systems that look approximately like they have velocity. The trouble comes when you want them to exactly satisfy the rule of “for any trajectory t, there is an equivalent trajectory t’ which is exactly the same except everything moves with some given velocity, and it still follows the laws of physics”, because if you have that property then you also have relativity because relativity

isthat property.I just realized,

This describes

Galileanrelativity. For special relativity you have to shift different objects’ velocities by different amounts, depending on what their velocity already is, so that you don’t cross the speed of light.So the fact that velocity (and not just rapidity) is used all the time in special relativity is already a counterexample to this being required for velocity to make sense.

Sure. I’d say that property is a lot stronger than “velocity exists as a concept”, which seems like an unobjectionable statement to make about any theory with particles or waves or both.

I guess there’s “velocity exists as a description you can impose on certain things within the trajectory”, and then there’s “velocity exists as a variable that can be given any value”. When I say relativity asserts that velocity exists, I mean in the second sense.

In the former case you would probably not include velocity within causal models of the system, whereas in the latter case you probably would.

As far as I know, condensed matter physicists use velocity and momentum to describe quasiparticles in systems that lack both Galilean and Lorentzian symmetry. I would call that a causal model.

Interesting point. Do the velocities for such quasiparticles act intuitively similar to velocities in ordinary physics?

Yes, it’s exactly the same except for the lack of symmetry. In particular, any quasiparticle can have any velocity (possibly up to some upper limit like the speed of light).

I had to look up “boost symmetry”, so for posterity, here’s the results of the lookup. From text-davinci-003:

I found this video on Lorentz transformations by minutephysics to be the best explanation I found, and I now feel I understand well enough to understand the point being made in context.

Here’s a lookup trace:

Very first I tried google, which gave results that seemed to mostly assume I wanted a math reference rather than a first visual explanation; it did link to wikipedia:LorentzTransformation, which does give a nice summary of the math, but I wasn’t yet sure it was the right thing. So then I asked text-davinci-003 (

~~because chatgpt is an insufferable teenager and I’m tired of talking to it whereas td3 is a … somewhat less insufferable teenager~~). td3 gave the above explanation.I was still pretty sure I didn’t quite understand, so I popped the explanation into metaphor.systems which gave me a bunch of vaguely relevant links, probably because it’s not quantum, it’s relativity, but I hadn’t noticed the error yet.

Then I sighed and tried a youtube search for “boost symmetry”. that gave one result, the video I linked above, which

didexplain to my satisfaction, and I stopped looking. I don’t think I could pass many tests on it at the moment, but my visual math system seems to have a solid enough grasp on it for now.(I enjoyed this style of “log of how I looked something up” comment.)

I have a series of search case studies if you want to read more like that.

curious if you’ve tried metaphor.

Yeah, sorry for the jargon. “System with a boost symmetry” = “relativistic system” as tailcalled was using it above.

Quoting tailcalled:

A “boost” is a transformation of a physical trajectory (“trajectory” = complete history of things happening in the universe) that changes it by adding a fixed offset to everything’s velocity; or equivalently, by making everything in the universe move in some direction while keeping all their relative velocities the same.

This is what we think happens in our universe!

Both general relativity and quantum field theory are field theories: they have degrees of freedom at each point in space (and time), and objects that “move” are just an approximate description of propagating patterns of field excitations that reproduce themselves exactly in another location after some time.

The most accessible example of this is that light is an electromagnetic wave (a pattern of mutually-reinforcing electric and magnetic waves); photons aren’t an additional part of the ontology, they’re just a description of how electromagnetic waves work in a quantum universe.

(Quantum field theory can bedescribed using particles to a very good degree of approximation, but the field formalism includes some observable phenomena that the particle formalism doesn’t, so it has a strictly better claim to being fundamental.)

Beware, though; string theory may be what underlies QFT and GR, and it describes a world of stringy objects that actually do move through space… But at the very least, the cellular-automata perspective on “objects” and “motion” is not at all strange from a modern physics perspective.

EDIT: I might go so far as to claim that the reason all electrons are identical is the same as the reason all gliders are identical.

I think this contrast is wrong.

^{[1]}IIRC, strings have the same status in string theory that particles do in QFT. In QM, a wavefunction assigns a complex number to each point inconfiguration space, where state space has an axis for each property of each particle.^{[2]}So, for instance, a system with 4 particles with only position and momentum will have a 12-dimensional configuration space.^{[3]}IIRC, string theory is basically a QFT over configurations of strings (and also branes?), instead of particles. So the “strings” are just as non-classical as the “fundamental particles” in QFT are.I don’t know much about string theory though, I could be wrong.

Oversimplifying a bit

4 particles * 3 dimensions. The reason it isn’t 24-dimensional is that position and momentum are canonical conjugates.

QFT doesn’t actually work like that—the “classical degrees of freedom” underlying its configuration space are classical fields over space, not properties of particles.

Note that Quantum Field Theory is not the same as the theory taught in “Quantum Mechanics” courses, which is as you describe.

“Quantum Mechanics” (in common parlance): quantum theory of (a fixed number of) particles, as you describe.

“Quantum Field Theory”: quantum theory of fields, which are ontologically similar to cellular automata.

“String Theory”: quantum theory of strings, and maybe branes, as you describe.*

“Quantum Mechanics” (strictly speaking): any of the above; quantum theory of anything.

You can do a change of basis in QFT and get something that looks like properties of particles (Fock space), and people do this very often, but the actual laws of physics in a QFT (the Lagrangian) can’t be expressed nicely in the particle ontology because of nonperturbative effects. This doesn’t come up often in practice—I spent most of grad school thinking QFT was

agnosticabout whether fields or particles are fundamental—but it’s an important thing to recognize in a discussion about whether modern physics privileges one ontology over the other.(Note that even in the imperfect particle ontology / Fock space picture, you don’t have a finite-dimensional classical configuration space. 12 dimensions for 4 particles works great until you end up with a superposition of states with different particle numbers!)

String theory is as you describe, AFAIK, which is why I contrasted it to QFT. But maybe a real string theorist would tell me that nobody believes those strings are the fundamental degrees of freedom, just like particles aren’t the fundamental degrees of freedom in QFT.

*Note: People sometimes use “string theory” to refer to weirder things like M-theory, where nobody knows which degrees of freedom to use...

anyone have thoughts on the usefulness of smoothlife/lenia/etc continuous cellular automata?

I have been thinking about this for quite a while. In particular this paper which learns robust “agents” in Lenia seems very relevant to themes in alignment research: Learning Sensorimotor Agency in Cellular Automata

Continuous cellular automata have a few properties which in my view make them a potentially interesting testbed for agency research in AI alignment:

They seem to be able to support (or make discoverable) much more robust and complex behaviours and agents than discrete CAs, which makes them seem a bit less like “toy” models.

They can be differentiable, which allows for more efficient search for interesting behaviours (as in the linked paper). This should also be amenable to being accelerated by GPUs.

I am hoping to get the time at some point to explore some of these ideas using Lenia (I am working a full time job so it would have to be more of a side project). In particular I would like to re-implement the sensorimotor agency paper then see what avenues that opens. Perhaps trying to quantitatively measure abstraction within Lenia, for example can we come up with a measure of abstraction that can automatically identify these “agents”. Or something along the lines of the information theory of individuality, to see whether optimizing globally for these measures (with gradient descent) actually produces something that we recognise as agents / individuals.

I will admit that a lot of my motivation for this is just that I find continuous cellular automata fascinating and fun, rather than considering this the most promising direction for alignment research. But I do also think it could be fruitful for alignment research.

This previous comment thread talks about about this idea, and I probably read it a while ago and was influenced.

Have spent some time playing with reversible CAs, and can confirm that they are

veryinteresting. They are a great example of how provable high-level properties (things like conservation of gliders) can come out of low level properties (reversibility).