Rohin Shah comments on Are there alternative to solving value transfer and extrapolation?

Rohin Shah 8 Dec 2021 2:11 UTC
LW: 4 AF: 4
0
AF
Notice that all those desiderata are much easier when the AI knows our (extrapolated) preferences. It is not clear at all that they can be achieved otherwise.
It seems like, as long as she wanted to, a human Alice could satisfy these desiderata when helping Bob, even though Alice doesn’t know Bob’s extrapolated preferences? So I’m not sure why you think an intelligent AI couldn’t do the same.
Maybe you think that it’s because Alice and Bob are both humans? But I also think Alice could satisfy these desiderata when helping an alien from a different planet—she would definitely make some mistakes, but presumably not the existentially catastrophic variety*.
*unless the alien has some really unusual values where an existential catastrophe can be caused by accident, e.g. “if anyone ever utters the word $WORD, that is the worst possible universe”, but those sorts of values seem very structurally different than human values.
- Stuart_Armstrong 8 Dec 2021 8:45 UTC
  LW: 3 AF: 3
  0
  AF Parent
  I actually don’t think that Alice could help a (sufficiently alien) alien. She needs an alien theory of mind to understand what the alien wants, how they would extrapolate, how to help that extrapolation without manipulating it, and so on. Without that, she’s just projecting human assumptions in alien behaviour and statements.
  What links here?
  - How an alien theory of mind might be unlearnable by Stuart_Armstrong (3 Jan 2022 11:16 UTC; 29 points)
  - Rohin Shah 8 Dec 2021 16:00 UTC
    LW: 2 AF: 2
    0
    AF Parent
    She needs an alien theory of mind to understand what the alien wants
    Absolutely, I would think that the first order of business would be to learn that alien theory of mind (and be very conservative until that’s done).
    Maybe you’re saying that this alien theory of mind is unlearnable, even for a very intelligent Alice? That seems pretty surprising, and I don’t feel the force of that intuition (despite the Occam’s razor impossibility result).
    What links here?
    How an alien theory of mind might be unlearnable by Stuart_Armstrong (3 Jan 2022 11:16 UTC; 29 points)
    - Stuart_Armstrong 3 Jan 2022 11:20 UTC
      LW: 3 AF: 3
      0
      AF Parent
      Developing this idea a bit: https://www.lesswrong.com/posts/kMJxwCZ4mc9w4ezbs/how-an-alien-theory-of-mind-might-be-unlearnable