I think this process is pointless: if the 10kg+10kg agent can reason and prove faithfully things about the 12kg agent, it means that it’s already more powerful than the 12kg agent, and so there’s no reason to upgrade to heavier agents.
Within the metaphor, that is true, but the metaphor of mass is very leaky here. The original 10kg agent, no matter how much empirical data it learns, can only learn at a certain rate (at least, if it works like current AI designs), because it only possesses so-and-so good a compression algorithm. It is limited in the speed with which it can learn from data and in its ability to generalize, and to prove things as logical theorems rather than believe them as empirically supported.
Hence why it wants to build the 12kg agent in the first place: the “larger” agent can compress data more efficiently, thus requiring a lower sample complexity to learn further correct conclusions (on top of the existing 8kg of empirical data).
I can think of other ways for AI to work (and am pursuing those, which includes building increasingly sophisticated little Prisoners’ Dilemma bots that achieve reliable cooperation in the one-shot game), but even those, if we wanted them to self-improve, would eventually have to reason about smaller versions of themselves, which leaves us right back at algorithmic information theory, trying to avoid the paradox theorems that arise when you reason about something about as complex as yourself.
Within the metaphor, that is true, but the metaphor of mass is very leaky here. The original 10kg agent, no matter how much empirical data it learns, can only learn at a certain rate (at least, if it works like current AI designs), because it only possesses so-and-so good a compression algorithm. It is limited in the speed with which it can learn from data and in its ability to generalize, and to prove things as logical theorems rather than believe them as empirically supported.
Hence why it wants to build the 12kg agent in the first place: the “larger” agent can compress data more efficiently, thus requiring a lower sample complexity to learn further correct conclusions (on top of the existing 8kg of empirical data).
I can think of other ways for AI to work (and am pursuing those, which includes building increasingly sophisticated little Prisoners’ Dilemma bots that achieve reliable cooperation in the one-shot game), but even those, if we wanted them to self-improve, would eventually have to reason about smaller versions of themselves, which leaves us right back at algorithmic information theory, trying to avoid the paradox theorems that arise when you reason about something about as complex as yourself.