Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?
Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?
I think that the AI should be perfectly aware that it is a losing decision (in the sense that it should be able to conclude that it wipes out humanity with certainty), but I think that you should program it to make that decision anyway (by programming it to be an updateless decider, not by special-casing, obviously).
The reason that I think you should program it that way is that programming it that way maximizes the utility you expect when you program the AI, because you can only preserve humanity in one possible future if you make the AI knowingly destroy humanity in the other possible future.
I guess the short answer to your question is that I think it’s sensible to discuss what a human should do, including how a human should build an AI, but not how “an AI should act” (in any other sense than how a human should build an AI to act); after all, a human might listen to advice, but a well-designed AI probably shouldn’t.
If we’re discussing the question how a human should build an AI (or modify themselves, if they can modify themselves), I think they should maximize their expected payoff and make the AI (themselves) updateless deciders. But that’s because that’s their best possible choice according to their knowledge at that point in time, not because it’s the best possible choice according to timeless philosophical ideals. So I don’t conclude that humans should make the choice that would have been their best possible bet a million years ago, but is terrible according to the info they in fact have now.
Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?
I think that the AI should be perfectly aware that it is a losing decision (in the sense that it should be able to conclude that it wipes out humanity with certainty), but I think that you should program it to make that decision anyway (by programming it to be an updateless decider, not by special-casing, obviously).
The reason that I think you should program it that way is that programming it that way maximizes the utility you expect when you program the AI, because you can only preserve humanity in one possible future if you make the AI knowingly destroy humanity in the other possible future.
I guess the short answer to your question is that I think it’s sensible to discuss what a human should do, including how a human should build an AI, but not how “an AI should act” (in any other sense than how a human should build an AI to act); after all, a human might listen to advice, but a well-designed AI probably shouldn’t.
If we’re discussing the question how a human should build an AI (or modify themselves, if they can modify themselves), I think they should maximize their expected payoff and make the AI (themselves) updateless deciders. But that’s because that’s their best possible choice according to their knowledge at that point in time, not because it’s the best possible choice according to timeless philosophical ideals. So I don’t conclude that humans should make the choice that would have been their best possible bet a million years ago, but is terrible according to the info they in fact have now.