I now understand what you are saying; thank you for being patient.
What I am confused by now is your optimism. I presented an argument based on incompleteness which convinced me that reasonable ADT agents won’t cooperate. In response you pointed out that incompleteness isn’t really the problem—there are other failure modes anyway. So why is it that you believe “reasonable” ADT agents will in fact cooperate, when I (unaware of this other failure mode) already believed they wouldn’t?
Part of what convinces me that something bad is going on is the following. Consider an agent in Newcomb’s problem who searches exhaustively for a proof that two-boxing is better than one-boxing; if he finds one he will two-box, and if he can’t he will one-box by default. By an argument like the one given in AI coordination in practice, this agent will two-box.
Your optimism seems to depend heavily on the difference between “enumerating all proofs up to a certain length and looking for the best provable utility guarantees” and “enumerating proofs until you find a complete set of moral arguments, and behaving randomly if you can’t.” Why do you believe that a complete set of moral arguments is provable in reasonable situations? Do you know of some non-trivial example where this is really the case?
Your optimism seems to depend heavily on the difference between “enumerating all proofs up to a certain length and looking for the best provable utility guarantees” and “enumerating proofs until you find a complete set of moral arguments, and behaving randomly if you can’t.”
Yes, and this answers your preceding question:
In response you pointed out that incompleteness isn’t really the problem—there are other failure modes anyway.
The strategy of “enumerating proofs until you find a complete set of moral arguments” doesn’t suffer from the incompleteness issue (whatever it is, if it’s indeed there, which I doubt can have the simple form you referred to).
Why do you believe that a complete set of moral arguments is provable in reasonable situations?
I don’t believe it is provable in any reasonable time, but perhaps given enough time it can often be proven. Building a set of mathematical tools for reasoning about this might prove a fruitful exercise, but I have shelved this line of inquiry for the last few months, and wasn’t working on it.
I now understand what you are saying; thank you for being patient.
What I am confused by now is your optimism. I presented an argument based on incompleteness which convinced me that reasonable ADT agents won’t cooperate. In response you pointed out that incompleteness isn’t really the problem—there are other failure modes anyway. So why is it that you believe “reasonable” ADT agents will in fact cooperate, when I (unaware of this other failure mode) already believed they wouldn’t?
Part of what convinces me that something bad is going on is the following. Consider an agent in Newcomb’s problem who searches exhaustively for a proof that two-boxing is better than one-boxing; if he finds one he will two-box, and if he can’t he will one-box by default. By an argument like the one given in AI coordination in practice, this agent will two-box.
Your optimism seems to depend heavily on the difference between “enumerating all proofs up to a certain length and looking for the best provable utility guarantees” and “enumerating proofs until you find a complete set of moral arguments, and behaving randomly if you can’t.” Why do you believe that a complete set of moral arguments is provable in reasonable situations? Do you know of some non-trivial example where this is really the case?
Yes, and this answers your preceding question:
The strategy of “enumerating proofs until you find a complete set of moral arguments” doesn’t suffer from the incompleteness issue (whatever it is, if it’s indeed there, which I doubt can have the simple form you referred to).
I don’t believe it is provable in any reasonable time, but perhaps given enough time it can often be proven. Building a set of mathematical tools for reasoning about this might prove a fruitful exercise, but I have shelved this line of inquiry for the last few months, and wasn’t working on it.