Seems like it comes down to the definition of goal-directed. Omohundro uses a chess-playing AI as a motivating example, and intuitively, a chess-playing AI seems “fully goal-directed”. But even as chess and go-playing AIs have become superhuman, and found creative plans humans can’t find, we haven’t seen any examples of them trying to e.g. kill other processes on your computer so they can have more computational resources and play a better game. A theory which can’t explain these observations doesn’t sound very useful.
Maybe this discussion is happening on the wrong level of abstaction. All abstractions are leaky, and abstractions like “intelligent”, “goal-oriented”, “creative plans”, etc. are much leakier than typical computer science abstractions. An hour of looking at the source code is going to be worth five hours of philosophizing. The most valuable thing the AI safety community can do might be to produce a checklist for someone creating the software architecture or reading the source code for the first AGI, so they know what failure modes to look for.
A chess tree search algorithm would never hit upon killing other processes. An evolutionary chess-playing algorithm might learn to do that. It’s not clear whether goal-directed is relevant to that distinction.
That’s not very imaginative. Here’s how a chess tree search algorithm—let’s take AlphaZero for concreteness—could learn to kill other processes, even if it has no explicit action which corresponds to interaction with other processes and is apparently sandboxed (aside from the usual sidechannels like resource use). It’s a variant of the evolutionary algorithm which learned to create a board so large that its competing GAs crashed/were killed while trying to deal with it (the Tic-tac-toe memory bomb). In this case, position evaluations can indirectly reveal that an exploration strategy caused enough memory use to trigger the OOM, killing rival processes, and freeing up resources for the tree search to get a higher win rate by more exploration:
one of the main limits to tree evaluation is memory consumption, due to the exponential growth of breadth-first memory requirements (this is true regardless of whether an explicit tree or implicit hash-based representation is used); to avoid this, memory consumption is often limited to a fixed amount of memory or a mix of depth/breadth-first strategies are used to tame memory growth, even though this may not be optimal, as it may force premature stopping to expansion of the game tree (resorting to light/heavy playouts) or force too much exploitation depthwise along a few promising lines of play and too little exploration etc. (One of the criticisms of AlphaZero, incidentally, was that too little RAM was given to the standard chess engines to permit them to reach their best performance.)
when a computer OS detects running out of memory, it’ll usually invoke an ‘OOM killer’, which may or may not kill the program which makes the request which uses up the last of free memory
so, it is possible that if a tree search algorithm exhausts memory (because the programmer didn’t remember to include a hard limit, the hard limit turns out to be incorrect for the machine being trained on, the limit is defined wrong like in terms of max depth instead of total nodes, etc), it may not crash or be killed but other programs, using unknown & potentially large percentages of memory, may be killed instead to free up memory. (I’ve observed this on Linux, to my frustration, where the programs I don’t want killed get killed by the OOM reaper instead of the haywire program.)
once other programs are killed to free up memory, all that memory is now available for the tree search algorithm to use; using this memory will increase performance by allowing more of the game tree to be explicitly evaluated, either wider or deeper.
in AlphaZero, the choice of widening or deepening is inherently controlled by the NN, which is trained to predict the result of the final values of each position and increase win probabilities.
reaching a position (which can be recognized by its additional complexity, indicating it lies at a certain additional depth in the tree and thus indirectly reveals how much memory is being used by the NN’s cumulative exploration) which triggers an OOM killing other programs will result in more accurate position evaluations, leading to higher values/higher win probability; so it will reinforce a strategy where it learns to aggressively widen early in the game to exhaust memory, waits for an OOM to happen, and then in the rest of the game proceeds to explore more aggressively (rather than depth-first exploit) given the new memory.
(Depending on the exact details of how the tree expansion & backups are done, it’s possible that the AlphaZero NN couldn’t observe the benefits of wide-then-deep—it might just look like noise in value estimates—but there are expert iteration variants where the NN directly controls the tree expansion rather than merely providing value estimates for the MCTS algorithm to explore using, and those should be able to observe indirect benefits of exploration strategies over a game.)
At no point does it interact directly with other processes, or even know that they exist; it just implicitly learns that expanding a decision tree in a particular wide-then-deep fashion leads to better evaluations more consistent with the true value and/or end-game result (because of side-effects leading to increased resource consumption leading to better performance). And that’s how a tree-search algorithm can hit upon killing other processes.
This story seems to reinforce my “leaky abstraction” point. The story hinges on nitty gritty details of how the AI is implemented and how the operating system manages resources. There’s no obvious usefulness in proving theorems and trying to make grand statements about utility maximizers, optimizers, goal-oriented systems, etc. I expect that by default, a programmer who tried to apply a theorem of Stuart’s to your chess system would not think to consider these details related to memory management (formally verifying a program’s source code says nothing about memory management if that happens lower in the stack). But if they did think to consider these details of memory management, having no access to Stuart’s theorem, they’d still have a good shot at preventing the problem (by changing the way the NN controls tree expansion or simply capping the program’s memory use).
Leaky abstractions are a common cause of computer security problems also. I think this is a big reason why crypto proofs fail so often. A proof is a tower on top of your existing set of abstractions; it’s fairly useless if your existing abstractions are faulty.
Seems like it comes down to the definition of goal-directed. Omohundro uses a chess-playing AI as a motivating example, and intuitively, a chess-playing AI seems “fully goal-directed”. But even as chess and go-playing AIs have become superhuman, and found creative plans humans can’t find, we haven’t seen any examples of them trying to e.g. kill other processes on your computer so they can have more computational resources and play a better game. A theory which can’t explain these observations doesn’t sound very useful.
Maybe this discussion is happening on the wrong level of abstaction. All abstractions are leaky, and abstractions like “intelligent”, “goal-oriented”, “creative plans”, etc. are much leakier than typical computer science abstractions. An hour of looking at the source code is going to be worth five hours of philosophizing. The most valuable thing the AI safety community can do might be to produce a checklist for someone creating the software architecture or reading the source code for the first AGI, so they know what failure modes to look for.
A chess tree search algorithm would never hit upon killing other processes. An evolutionary chess-playing algorithm might learn to do that. It’s not clear whether goal-directed is relevant to that distinction.
That’s not very imaginative. Here’s how a chess tree search algorithm—let’s take AlphaZero for concreteness—could learn to kill other processes, even if it has no explicit action which corresponds to interaction with other processes and is apparently sandboxed (aside from the usual sidechannels like resource use). It’s a variant of the evolutionary algorithm which learned to create a board so large that its competing GAs crashed/were killed while trying to deal with it (the Tic-tac-toe memory bomb). In this case, position evaluations can indirectly reveal that an exploration strategy caused enough memory use to trigger the OOM, killing rival processes, and freeing up resources for the tree search to get a higher win rate by more exploration:
one of the main limits to tree evaluation is memory consumption, due to the exponential growth of breadth-first memory requirements (this is true regardless of whether an explicit tree or implicit hash-based representation is used); to avoid this, memory consumption is often limited to a fixed amount of memory or a mix of depth/breadth-first strategies are used to tame memory growth, even though this may not be optimal, as it may force premature stopping to expansion of the game tree (resorting to light/heavy playouts) or force too much exploitation depthwise along a few promising lines of play and too little exploration etc. (One of the criticisms of AlphaZero, incidentally, was that too little RAM was given to the standard chess engines to permit them to reach their best performance.)
when a computer OS detects running out of memory, it’ll usually invoke an ‘OOM killer’, which may or may not kill the program which makes the request which uses up the last of free memory
so, it is possible that if a tree search algorithm exhausts memory (because the programmer didn’t remember to include a hard limit, the hard limit turns out to be incorrect for the machine being trained on, the limit is defined wrong like in terms of max depth instead of total nodes, etc), it may not crash or be killed but other programs, using unknown & potentially large percentages of memory, may be killed instead to free up memory. (I’ve observed this on Linux, to my frustration, where the programs I don’t want killed get killed by the OOM reaper instead of the haywire program.)
once other programs are killed to free up memory, all that memory is now available for the tree search algorithm to use; using this memory will increase performance by allowing more of the game tree to be explicitly evaluated, either wider or deeper.
in AlphaZero, the choice of widening or deepening is inherently controlled by the NN, which is trained to predict the result of the final values of each position and increase win probabilities.
reaching a position (which can be recognized by its additional complexity, indicating it lies at a certain additional depth in the tree and thus indirectly reveals how much memory is being used by the NN’s cumulative exploration) which triggers an OOM killing other programs will result in more accurate position evaluations, leading to higher values/higher win probability; so it will reinforce a strategy where it learns to aggressively widen early in the game to exhaust memory, waits for an OOM to happen, and then in the rest of the game proceeds to explore more aggressively (rather than depth-first exploit) given the new memory.
(Depending on the exact details of how the tree expansion & backups are done, it’s possible that the AlphaZero NN couldn’t observe the benefits of wide-then-deep—it might just look like noise in value estimates—but there are expert iteration variants where the NN directly controls the tree expansion rather than merely providing value estimates for the MCTS algorithm to explore using, and those should be able to observe indirect benefits of exploration strategies over a game.)
At no point does it interact directly with other processes, or even know that they exist; it just implicitly learns that expanding a decision tree in a particular wide-then-deep fashion leads to better evaluations more consistent with the true value and/or end-game result (because of side-effects leading to increased resource consumption leading to better performance). And that’s how a tree-search algorithm can hit upon killing other processes.
This story seems to reinforce my “leaky abstraction” point. The story hinges on nitty gritty details of how the AI is implemented and how the operating system manages resources. There’s no obvious usefulness in proving theorems and trying to make grand statements about utility maximizers, optimizers, goal-oriented systems, etc. I expect that by default, a programmer who tried to apply a theorem of Stuart’s to your chess system would not think to consider these details related to memory management (formally verifying a program’s source code says nothing about memory management if that happens lower in the stack). But if they did think to consider these details of memory management, having no access to Stuart’s theorem, they’d still have a good shot at preventing the problem (by changing the way the NN controls tree expansion or simply capping the program’s memory use).
Leaky abstractions are a common cause of computer security problems also. I think this is a big reason why crypto proofs fail so often. A proof is a tower on top of your existing set of abstractions; it’s fairly useless if your existing abstractions are faulty.