Welp, this scoops a bunch of the stuff in my “Why acausal trade matters” chapter. :D Nice!
The DDT idea amuses me. I guess it’s maybe the best shot we have, but boy do I get a sense of doom when I imagine that the fate of the world depends on our ability to control/steer/oversee AIs as they become more capable than us in many important ways via keeping them dumb in various other important ways. I guess there’s that thing the crocodile wrestlers do where you hold their mouth shut since their muscles for opening are much weaker than their muscles for closing.
I have only skimmed the Cohen et al paper, so I probably just don’t understand what’s going on, but I don’t think that only using the maximum a posteriori world model helps much. Doesn’t that just mean you ignore (for planning purposes) possibilities other than the most likely one? If so, then that won’t help at all if you think you are probably in a simulation. It would only help in cases where you thought you might be, but probably weren’t.
One way of looking at DDT is “keeping it dumb in various ways.” I think another way of thinking about is just designing a different sort of agent, which is “dumb” according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they’re just sacrificing utility for no reason.
There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.
I agree that the maximum a posterior world doesn’t help that much, but I think there is some sense in which “having uncertainty” might be undesirable.
Also: I think making sure our agents are DDT is probably going to be approximately as difficult as making them aligned. Related: Your handle for anthropic uncertainty is:
never reason about anthropic uncertainty. DDT agents always think they know who they are.
“Always think they know who they are” doesn’t cut it; you can think you know you’re in a simulation. I think a more accurate version would be something like “Always think that you are on an original planet, i.e. one in which life appeared ‘naturally,’ rather than a planet in the midst of some larger interstellar civilization, or a simulation of a planet, or whatever. Basically, you need to believe that you were created by humans but that no intelligence played a role in the creation and/or arrangement of the humans who created you. Or… no role other than the “normal” one in which parents create offspring, governments create institutions, etc. I think this is a fairly specific belief, and I don’t think we have the ability to shape our AIs beliefs with that much precision, at least not yet.
Welp, this scoops a bunch of the stuff in my “Why acausal trade matters” chapter. :D Nice!
The DDT idea amuses me. I guess it’s maybe the best shot we have, but boy do I get a sense of doom when I imagine that the fate of the world depends on our ability to control/steer/oversee AIs as they become more capable than us in many important ways via keeping them dumb in various other important ways. I guess there’s that thing the crocodile wrestlers do where you hold their mouth shut since their muscles for opening are much weaker than their muscles for closing.
I have only skimmed the Cohen et al paper, so I probably just don’t understand what’s going on, but I don’t think that only using the maximum a posteriori world model helps much. Doesn’t that just mean you ignore (for planning purposes) possibilities other than the most likely one? If so, then that won’t help at all if you think you are probably in a simulation. It would only help in cases where you thought you might be, but probably weren’t.
One way of looking at DDT is “keeping it dumb in various ways.” I think another way of thinking about is just designing a different sort of agent, which is “dumb” according to us but not really dumb in an intrinsic sense. You can imagine this DDT agent looking at agents that do do acausal trade and thinking they’re just sacrificing utility for no reason.
There is some slight awkwardness in that the decision problems agents in this universe actually encounter means that UDT agents will get higher utility than DDT agents.
I agree that the maximum a posterior world doesn’t help that much, but I think there is some sense in which “having uncertainty” might be undesirable.
Also: I think making sure our agents are DDT is probably going to be approximately as difficult as making them aligned. Related: Your handle for anthropic uncertainty is:
“Always think they know who they are” doesn’t cut it; you can think you know you’re in a simulation. I think a more accurate version would be something like “Always think that you are on an original planet, i.e. one in which life appeared ‘naturally,’ rather than a planet in the midst of some larger interstellar civilization, or a simulation of a planet, or whatever. Basically, you need to believe that you were created by humans but that no intelligence played a role in the creation and/or arrangement of the humans who created you. Or… no role other than the “normal” one in which parents create offspring, governments create institutions, etc. I think this is a fairly specific belief, and I don’t think we have the ability to shape our AIs beliefs with that much precision, at least not yet.