Jonii comments on Self-modification is the correct justification for updateless decision theory

Jonii 11 Apr 2010 23:00 UTC
0 points
0

Then you have an AI that satisfies a certain kind of philosopher, wins big in a certain logically impossible world, and destroys humanity.

And if we assume that it’s better to have Earth exist one million years longer, this is the correct thing to do, no question about it, right? If you’re going to take a bet which decides the fate of entire human civilization, you want to take the best bet, which in this case(we assume) was to risk to live only for million years instead risking of exploding right away.

Unless, of course, in the counterfactual you know you would’ve pressed the button even though you now don’t. Rigging the lottery is a sneaky way out of the problem.
- Benya 12 Apr 2010 0:35 UTC
  0 points
  0
  Parent
  If you create your AI before you can infer from Omega’s actions what the umpteenth digit of pi is, then I agree that you should create an AI that presses the button, even if the AI finds out (through Omega’s actions) that the digit is in fact odd. This is because from your perspective when you create the AI, this kind of AI maximizes your expected utility (measured in humanity-years).
  
  But if you create your AI after you can infer what the digit is (in the updated-after-your-comment version of my post, by observing that you exist and Alpha Centauri isn’t purple), I argue that you should not create an AI that presses the button, because at that point, you know that’s the losing decision. If you disagree, I don’t yet understand why.
  - Jonii 12 Apr 2010 9:36 UTC
    0 points
    0
    Parent
    
    If you create your AI before you can infer from Omega’s actions what the umpteenth digit of pi is, then I agree that you should create an AI that presses the button, even if the AI finds out (through Omega’s actions) that the digit is in fact odd.
    
    If you can somehow figure it out, then yes, you shouldn’t press the button. If you know that the simulated you would’ve known to press the button when you don’t, you’re not anymore dealing with “take either 50% chance of world exploding right now VS. 50% chance of world exploding million years from now”, but a lot simpler “I offer to destroy the world, you can say yes or no”. Updateless agent would naturally want to take the winning bet if gaining that information were somehow possible.
    
    So, if you know which digit omega used to decide his actions, and how, and you happen to know that digit, the bet you’re taking is the simpler one, the one where you can simply answer ‘yes’ or ‘no’. Observing that Earth has not been destroyed is not enough evidence though, because the simulated, non-conscious you would’ve observed roughly the same thing. Only if there were some sort of difference that you knew you could and would use in the simulation, like, your knowledge of umpteenth digit of pi, or color of some object in the sky(we’re assuming Omega tells you this much in both cases. This about the fate of humanity, you should seriously be certain about what sort of bets you’re taking.
  - JGWeissman 12 Apr 2010 0:42 UTC
    0 points
    0
    Parent
    Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?
    - Benya 12 Apr 2010 1:21 UTC
      0 points
      0
      Parent
      
      Why do you think that you should conclude that pushing the button is a losing decision upon observing evidence that the digit is odd, but the AI should not? Is a different epistemology and decision theory ideal for you than what is ideal for the AI?
      
      I think that the AI should be perfectly aware that it is a losing decision (in the sense that it should be able to conclude that it wipes out humanity with certainty), but I think that you should program it to make that decision anyway (by programming it to be an updateless decider, not by special-casing, obviously).
      
      The reason that I think you should program it that way is that programming it that way maximizes the utility you expect when you program the AI, because you can only preserve humanity in one possible future if you make the AI knowingly destroy humanity in the other possible future.
      
      I guess the short answer to your question is that I think it’s sensible to discuss what a human should do, including how a human should build an AI, but not how “an AI should act” (in any other sense than how a human should build an AI to act); after all, a human might listen to advice, but a well-designed AI probably shouldn’t.
      
      If we’re discussing the question how a human should build an AI (or modify themselves, if they can modify themselves), I think they should maximize their expected payoff and make the AI (themselves) updateless deciders. But that’s because that’s their best possible choice according to their knowledge at that point in time, not because it’s the best possible choice according to timeless philosophical ideals. So I don’t conclude that humans should make the choice that would have been their best possible bet a million years ago, but is terrible according to the info they in fact have now.