Sohaib Imran comments on quila’s Shortform

Sohaib Imran 6 Jan 2025 17:41 UTC
6 points
0
And why must alignment be binary? (aligned, or misaligned, where misaligned necessarily means it destroys the world and does not care about property rights)

Why can you not have an a superintelligence that is only misaligned when it comes to issues of wealth distribution?

Relatedly, are we sure that CEV is computable?
- MondSemmel 6 Jan 2025 18:18 UTC
  3 points
  0
  Parent
  I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
  And what does it even mean for a superintelligence to be “only misaligned when it comes to issues of wealth distribution”? Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
  - Sohaib Imran 6 Jan 2025 23:27 UTC
    3 points
    0
    Parent
    
    I guess we could in theory fail and only achieve partial alignment, but that seems like a weird scenario to imagine. Like shooting for a 1 in big_number target (= an aligned mind design in the space of all potential mind designs) and then only grazing it. How would that happen in practice?
    
    Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world? If so, why? is it a bigger target? is it more stable?
    
    Can’t you then just ask your pretty-much-perfectly-aligned entity to align itself on that remaining question?
    
    No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
    - MondSemmel 14 Jan 2025 16:24 UTC
      1 point
      −1
      Parent
      Are you saying that the 1 aligned mind design in the space of all potential mind designs is an easier target than the subspace composed of mind designs that does not destroy the world?
      I didn’t mean that there’s only one aligned mind design, merely that almost all (99.999999...%) conceivable mind designs are unaligned by default, so the only way to survive is if the first AGI is designed to be aligned, there’s no hope that a random AGI just happens to be aligned. And since we’re heading for the latter scenario, it would be very surprising to me if we managed to design a partially aligned AGI and lose that way.
      No, because the you who can ask (the persons in power) is themselves misaligned with the 1 alignment target that perfectly captures all our preferences.
      I expect the people in power are worrying about this way more than they worry about the overwhelming difficulty of building an aligned AGI in the first place. (Case in point: the manufactured AI race with China.) As a result I expect they’ll succeed at building a by-default-unaligned AGI and driving themselves and us to extinction. So I’m not worried about instead ending up in a dystopia ruled by some government or AI lab owner.