That’s not very imaginative. Here’s how a chess tree search algorithm—let’s take AlphaZero for concreteness—could learn to kill other processes, even if it has no explicit action which corresponds to interaction with other processes and is apparently sandboxed (aside from the usual sidechannels like resource use). It’s a variant of the evolutionary algorithm which learned to create a board so large that its competing GAs crashed/were killed while trying to deal with it (the Tic-tac-toe memory bomb). In this case, position evaluations can indirectly reveal that an exploration strategy caused enough memory use to trigger the OOM, killing rival processes, and freeing up resources for the tree search to get a higher win rate by more exploration:
one of the main limits to tree evaluation is memory consumption, due to the exponential growth of breadth-first memory requirements (this is true regardless of whether an explicit tree or implicit hash-based representation is used); to avoid this, memory consumption is often limited to a fixed amount of memory or a mix of depth/breadth-first strategies are used to tame memory growth, even though this may not be optimal, as it may force premature stopping to expansion of the game tree (resorting to light/heavy playouts) or force too much exploitation depthwise along a few promising lines of play and too little exploration etc. (One of the criticisms of AlphaZero, incidentally, was that too little RAM was given to the standard chess engines to permit them to reach their best performance.)
when a computer OS detects running out of memory, it’ll usually invoke an ‘OOM killer’, which may or may not kill the program which makes the request which uses up the last of free memory
so, it is possible that if a tree search algorithm exhausts memory (because the programmer didn’t remember to include a hard limit, the hard limit turns out to be incorrect for the machine being trained on, the limit is defined wrong like in terms of max depth instead of total nodes, etc), it may not crash or be killed but other programs, using unknown & potentially large percentages of memory, may be killed instead to free up memory. (I’ve observed this on Linux, to my frustration, where the programs I don’t want killed get killed by the OOM reaper instead of the haywire program.)
once other programs are killed to free up memory, all that memory is now available for the tree search algorithm to use; using this memory will increase performance by allowing more of the game tree to be explicitly evaluated, either wider or deeper.
in AlphaZero, the choice of widening or deepening is inherently controlled by the NN, which is trained to predict the result of the final values of each position and increase win probabilities.
reaching a position (which can be recognized by its additional complexity, indicating it lies at a certain additional depth in the tree and thus indirectly reveals how much memory is being used by the NN’s cumulative exploration) which triggers an OOM killing other programs will result in more accurate position evaluations, leading to higher values/higher win probability; so it will reinforce a strategy where it learns to aggressively widen early in the game to exhaust memory, waits for an OOM to happen, and then in the rest of the game proceeds to explore more aggressively (rather than depth-first exploit) given the new memory.
(Depending on the exact details of how the tree expansion & backups are done, it’s possible that the AlphaZero NN couldn’t observe the benefits of wide-then-deep—it might just look like noise in value estimates—but there are expert iteration variants where the NN directly controls the tree expansion rather than merely providing value estimates for the MCTS algorithm to explore using, and those should be able to observe indirect benefits of exploration strategies over a game.)
At no point does it interact directly with other processes, or even know that they exist; it just implicitly learns that expanding a decision tree in a particular wide-then-deep fashion leads to better evaluations more consistent with the true value and/or end-game result (because of side-effects leading to increased resource consumption leading to better performance). And that’s how a tree-search algorithm can hit upon killing other processes.
This story seems to reinforce my “leaky abstraction” point. The story hinges on nitty gritty details of how the AI is implemented and how the operating system manages resources. There’s no obvious usefulness in proving theorems and trying to make grand statements about utility maximizers, optimizers, goal-oriented systems, etc. I expect that by default, a programmer who tried to apply a theorem of Stuart’s to your chess system would not think to consider these details related to memory management (formally verifying a program’s source code says nothing about memory management if that happens lower in the stack). But if they did think to consider these details of memory management, having no access to Stuart’s theorem, they’d still have a good shot at preventing the problem (by changing the way the NN controls tree expansion or simply capping the program’s memory use).
Leaky abstractions are a common cause of computer security problems also. I think this is a big reason why crypto proofs fail so often. A proof is a tower on top of your existing set of abstractions; it’s fairly useless if your existing abstractions are faulty.
What I like about this thread, and why I’m worried about people reading this post and updating away from thinking that sufficiently powerful processes that don’t look like what we think are dangerous is safe, is that it helps make clear that Rohin seems to be making an argument that hinges on leaky or even confused abstractions. I’m not sure any of the rest of us have much better abstractions to offer that aren’t leaky, and I want to encourage what Rohin does in this post of thinking through the implications of the abstractions he’s using to draw conclusions that are specific enough to be critiqued, because through a process like this we can get a clearer idea of where we have shared confusion and then work to resolve it.
The argument Rohin is responding to also rests on leaky abstractions, I would argue.
At the end of the day sometimes the best approach, if there aren’t any good abstractions in a particular domain, is to set aside your abstractions and look directly at the object level.
If there is a fairly simple, robust FAI design out there, and we rule out the swath of design space it resides in based on an incorrect inference from a leaky abstraction, that would be a bad outcome.