Consider using reversible automata for alignment research

In recent years, there have been several cases of alignment researchers using Conway’s Game of Life as a research environment;

Conway’s Game of Life is by far the most popular and well-known cellular automaton. And for good reason; it’s immediately appealing and just begs to be played with. It is a great model context in which to research things like optimization and agency;

  • It’s deterministic, making experiments clean and replicable.

  • It’s discrete in both time and space, which is often easier to analyze and reason about.

  • The rules are intuitive and simple (unlike, say, the Standard Model).

  • The board can be finite or infinite, which can both be important contexts to analyze.

  • It has a concept of spacial dimensions and causal propagation, so the boards end up with object-like patterns, and a “speed of light”.

  • The system is Turing complete, which gives it the potential for profoundly complex and unpredictable behavior, despite the simple rules.

  • Any agents inside it are inherently “embedded”.

  • It’s really fun to play with! There are many website where you can click around and design your own Life animations. Lots of fascinating patterns have been found by people over the years. This can make it a little easier for a researcher to get through the day.

These properties do a good job at mimicking the real universe while being significantly more tractable.

But those who play with Life will notice some odd things about it that are not mimicked in the real world, especially if they’re familiar with physics;

  • There’s no obvious conversation laws, of either momentum or energy.

  • Objects often just disappear.

  • Structures are very brittle; most interactions between object-like things destroy both objects.

  • The rules are not reversible; two states can both evolve into the same state, meaning that you can’t “know” where you came from just by looking at the current state.

These properties are related, and specifically, I think that adopting a reversible rule gives you most of the other properties.

Because the rules are reversible, any evolution on a finite board must eventually loop back to its original state. This lets you prove interesting behavior relatively easily. For example, if two spaceships interact, at least some live pixels must eventually exit the region. If they didn’t, then that would imply that there was a finite board size around the interaction area which would not loop back to the original state. Note that this behavior about interactions carries over to infinite board sizes!

Some reversible rules attain robustness by having conservation of the number of live pixels. The structures won’t necessarily stay the same, but at least they won’t disappear entirely. (Here’s an example rule that does not conserve pixel count.)

Overall, I think it’s worth us doing more research with reversible rules because it appears that our universe has a reversible rule, at least to an extremely good approximation, and so this might lead to more relevant and accurate conclusions about how optimization and agency works in this universe.

I’ve seen three specific (interesting) reversible cellular automata so far, which are Critters, the billiard ball computer, and the Single Rotation Rule.

Critters has an equilbrium state of static-ish areas with gliders constantly migrating between them.
A wide variety of spaceships available in the Single Rotation Rule.

I don’t want to nerd snipe alignment researchers, but it might be worth playing around with some of these other automata to see if they are a worthwhile tool. Golly, one of the most developed and widely used cellular automata development platforms, has facilities for defining your own automata rules (and lists Critters on its rules help page). This site is also a very flexible automata simulator, specifically designed for reversible rules. For more discussion, I found this paper (by one of Critter’s creators) very accessible, and also enjoyed this blog post.