I’m under the impression that all these “Löbian obstacles” essentially boil down to the “you can’t be sure that you are not insane” kind of stuff.
IIUC, your thesis advisors scenario can be solved by having a first thesis advisor who only outputs theorems which are provable within some formal system and every other thesis advisor trusting by axiom his predecessor. This doesn’t work in your (MIRI’s) framework because you want the agent to trust not its ancestors but its successors, and you want this to hold for an unbounded number of generations. But this may just be a problem of your framework, not some fundamental propriety of computational agents. At least, you haven’t produced a satisfactory argument for that.
Anyway, I think that framing these problems in terms of provability logic might be inappropriate, as concepts like “beliefs”, “trust” or “agents” are difficult to formalize in that language, hence you get a mix of formal and informal reasoning, which is the traditional recipe for paradoxes and errors.
It’s much better to frame these problems directly in terms of computer programs. This way agents, internal states, world physics, etc., can be defined in a much more precise way.
In this framework the main “Gödelian” result is Rice’s theorem, which, among other things, implies that functional equivalence between arbitrary programs is uncomputable. Nevertheless, there exist under-approximations of functional equivalence which are computable, and this can be exploited by “cliquing” strategies in scenarios where programs have to “trust” each other (e.g. program-equilibrium prisoner’s dilemma). Cliquing based on a lexical equivalence predicate is restrictive, obviously, but it can be generalized on any predicate that under-approximates functional equivalence. Of course, this assumes that you start with a program that is free of bugs, OS and compilers are free of bugs, hardware never glitches, and so on. But, unrealistic as it is, this seems to be an unavoidable assumption: no program will ever be able to debug itself, or to certify that the hardware it runs on works as advertised.
Yeah, good points. Previously discussed in this post and this comment. (I see you also commented there.) I’d love to see more progress in that direction.
I’m under the impression that all these “Löbian obstacles” essentially boil down to the “you can’t be sure that you are not insane” kind of stuff.
IIUC, your thesis advisors scenario can be solved by having a first thesis advisor who only outputs theorems which are provable within some formal system and every other thesis advisor trusting by axiom his predecessor.
This doesn’t work in your (MIRI’s) framework because you want the agent to trust not its ancestors but its successors, and you want this to hold for an unbounded number of generations.
But this may just be a problem of your framework, not some fundamental propriety of computational agents. At least, you haven’t produced a satisfactory argument for that.
Anyway, I think that framing these problems in terms of provability logic might be inappropriate, as concepts like “beliefs”, “trust” or “agents” are difficult to formalize in that language, hence you get a mix of formal and informal reasoning, which is the traditional recipe for paradoxes and errors.
It’s much better to frame these problems directly in terms of computer programs. This way agents, internal states, world physics, etc., can be defined in a much more precise way.
In this framework the main “Gödelian” result is Rice’s theorem, which, among other things, implies that functional equivalence between arbitrary programs is uncomputable. Nevertheless, there exist under-approximations of functional equivalence which are computable, and this can be exploited by “cliquing” strategies in scenarios where programs have to “trust” each other (e.g. program-equilibrium prisoner’s dilemma).
Cliquing based on a lexical equivalence predicate is restrictive, obviously, but it can be generalized on any predicate that under-approximates functional equivalence.
Of course, this assumes that you start with a program that is free of bugs, OS and compilers are free of bugs, hardware never glitches, and so on. But, unrealistic as it is, this seems to be an unavoidable assumption: no program will ever be able to debug itself, or to certify that the hardware it runs on works as advertised.
Yeah, good points. Previously discussed in this post and this comment. (I see you also commented there.) I’d love to see more progress in that direction.
Thanks. Added a comment in the WD/EY subthread.