This story seems to reinforce my “leaky abstraction” point. The story hinges on nitty gritty details of how the AI is implemented and how the operating system manages resources. There’s no obvious usefulness in proving theorems and trying to make grand statements about utility maximizers, optimizers, goal-oriented systems, etc. I expect that by default, a programmer who tried to apply a theorem of Stuart’s to your chess system would not think to consider these details related to memory management (formally verifying a program’s source code says nothing about memory management if that happens lower in the stack). But if they did think to consider these details of memory management, having no access to Stuart’s theorem, they’d still have a good shot at preventing the problem (by changing the way the NN controls tree expansion or simply capping the program’s memory use).
Leaky abstractions are a common cause of computer security problems also. I think this is a big reason why crypto proofs fail so often. A proof is a tower on top of your existing set of abstractions; it’s fairly useless if your existing abstractions are faulty.
What I like about this thread, and why I’m worried about people reading this post and updating away from thinking that sufficiently powerful processes that don’t look like what we think are dangerous is safe, is that it helps make clear that Rohin seems to be making an argument that hinges on leaky or even confused abstractions. I’m not sure any of the rest of us have much better abstractions to offer that aren’t leaky, and I want to encourage what Rohin does in this post of thinking through the implications of the abstractions he’s using to draw conclusions that are specific enough to be critiqued, because through a process like this we can get a clearer idea of where we have shared confusion and then work to resolve it.
The argument Rohin is responding to also rests on leaky abstractions, I would argue.
At the end of the day sometimes the best approach, if there aren’t any good abstractions in a particular domain, is to set aside your abstractions and look directly at the object level.
If there is a fairly simple, robust FAI design out there, and we rule out the swath of design space it resides in based on an incorrect inference from a leaky abstraction, that would be a bad outcome.