Is it possible to give just the merest of hints about what the theorem might have to do with AI?
Qiaochu Yuan, a past MIRI workshop participant, gave a concise answer:
Suppose you want to design an AI which in turn designs its (smarter) descendants. You’d like to have some guarantee that not only the AI but its descendants will do what you want them to do; call that goal G. As a toy model, suppose the AI works by storing and proving first-order statements about a model of the environment, then performing an action A as soon as it can prove that action A accomplishes goal G. This action criterion should apply to any action the AI takes, including the production of its descendants. So it would be nice if the AI could prove that if its descendants prove that action A leads to goal G, then action A in fact leads to goal G.
The problem is that if the AI and its descendants all believe the same amount of mathematics, say PA, then by Lob’s theorem this implies that the AI can already prove that action A leads to goal G. So it must already do the cognitive work that it wants its smarter descendants to do, which raises the question of why it needs to build those descendants in the first place. So in this toy model Lob’s theorem appears as a barrier to an AI designing descendants which it both can’t simulate but can provably trust.
Qiaochu’s answer seems off. The argument that the parent AI can already prove what it wants the successor AI to prove and therefore isn’t building a more powerful successor, isn’t very compelling because being able to prove things is a different problem than searching for useful things to prove. It also doesn’t encompass what I understand to be the Lobian obstacle, that being able to prove that if your own mathematical system proves something that thing is true implies that your system is inconsistent.
It’s entirely possible that my understanding is incomplete, but that was my interpretation of an explanation Eliezer gave me once. Two comments: first, this toy model is ignoring the question of how to go about searching for useful things to prove; you can think of the AI and its descendants as trying to determine whether or not any action leads to goal G. Second, it’s true that the AI can’t reflectively trust itself and that this is a problem, but the AI’s action criterion doesn’t require that it reflectively trust itself to perform actions. However, it does require that it trust its descendants to construct its descendants.
Reproduced for convenience...
On G+, John Baez wrote about the MIRI workshop he’s currently attending, in particular about Löb’s Theorem.
Timothy Gowers asked:
Qiaochu Yuan, a past MIRI workshop participant, gave a concise answer:
Qiaochu’s answer seems off. The argument that the parent AI can already prove what it wants the successor AI to prove and therefore isn’t building a more powerful successor, isn’t very compelling because being able to prove things is a different problem than searching for useful things to prove. It also doesn’t encompass what I understand to be the Lobian obstacle, that being able to prove that if your own mathematical system proves something that thing is true implies that your system is inconsistent.
Is there more context on this?
It’s entirely possible that my understanding is incomplete, but that was my interpretation of an explanation Eliezer gave me once. Two comments: first, this toy model is ignoring the question of how to go about searching for useful things to prove; you can think of the AI and its descendants as trying to determine whether or not any action leads to goal G. Second, it’s true that the AI can’t reflectively trust itself and that this is a problem, but the AI’s action criterion doesn’t require that it reflectively trust itself to perform actions. However, it does require that it trust its descendants to construct its descendants.