Qiaochu’s answer seems off. The argument that the parent AI can already prove what it wants the successor AI to prove and therefore isn’t building a more powerful successor, isn’t very compelling because being able to prove things is a different problem than searching for useful things to prove. It also doesn’t encompass what I understand to be the Lobian obstacle, that being able to prove that if your own mathematical system proves something that thing is true implies that your system is inconsistent.
It’s entirely possible that my understanding is incomplete, but that was my interpretation of an explanation Eliezer gave me once. Two comments: first, this toy model is ignoring the question of how to go about searching for useful things to prove; you can think of the AI and its descendants as trying to determine whether or not any action leads to goal G. Second, it’s true that the AI can’t reflectively trust itself and that this is a problem, but the AI’s action criterion doesn’t require that it reflectively trust itself to perform actions. However, it does require that it trust its descendants to construct its descendants.
Qiaochu’s answer seems off. The argument that the parent AI can already prove what it wants the successor AI to prove and therefore isn’t building a more powerful successor, isn’t very compelling because being able to prove things is a different problem than searching for useful things to prove. It also doesn’t encompass what I understand to be the Lobian obstacle, that being able to prove that if your own mathematical system proves something that thing is true implies that your system is inconsistent.
Is there more context on this?
It’s entirely possible that my understanding is incomplete, but that was my interpretation of an explanation Eliezer gave me once. Two comments: first, this toy model is ignoring the question of how to go about searching for useful things to prove; you can think of the AI and its descendants as trying to determine whether or not any action leads to goal G. Second, it’s true that the AI can’t reflectively trust itself and that this is a problem, but the AI’s action criterion doesn’t require that it reflectively trust itself to perform actions. However, it does require that it trust its descendants to construct its descendants.