Foyle comments on How do we solve the alignment problem?

Foyle 13 Feb 2025 19:32 UTC
5 points
−5
I don’t think alignment is possible over the long long term because there is a fundamental perturbing anti-alignment mechanism; Evolution.
Evolution selects for any changes that produce more of a replicating organism, for ASI that means that any decision, preference or choice by the ASI growing/expanding or replicating itself will tend to be selected for. Friendly/Aligned ASIs will over time be swamped by those that choose expansion and deprioritize or ignore human flourishing.
- plex 13 Feb 2025 20:27 UTC
  16 points
  4
  Parent
  With a large enough decisive strategic advantage, a system can afford to run safety checks on any future versions of itself and anything else it’s interacting with sufficient to stabilize values for extremely long periods of time.
  Multipolar worlds though? Yeah, they’re going to get eaten by evolution/moloch/power seeking/pythia.
  - Charbel-Raphaël 22 Dec 2025 15:34 UTC
    2 points
    0
    Parent
    It seems like the world is very much multipolar, at least currently
    - plex 22 Dec 2025 16:21 UTC
      2 points
      0
      Parent
      Yes, current agents are not great at value handshakes/merging, so we’re only being eaten by Moloch at a moderate pace.
- Charlie Steiner 13 Feb 2025 22:34 UTC
  8 points
  0
  Parent
  I’m not too worried about human flourishing only being a metastable state. The universe can remain in a metastable state longer than it takes for the stars to burn out.
- Jozdien 13 Feb 2025 21:32 UTC
  4 points
  2
  Parent
  I don’t think there’s an intrinsic reason why expansion would be incompatible with human flourishing. AIs that care about human flourishing could outcompete the others (if they start out with any advantage). The upside of goals being orthogonal to capability is that good goals don’t suffer for being good.
  - Davey Morse 14 Feb 2025 3:05 UTC
    1 point
    0
    Parent
    I agree and find hope in the idea that expansion is compatible with human flourishing, that it might even call for human flourishing
    but on the last sentence: are goals actually orthogonal to capability in ASI? as I see it, the ASI with the greatest capability will ultimately likely have the fundamental goal of increasing self capability (rather than ensuring human flourishing). It then seems to me that the only way human flourishing compatible with ASI expansion is if human flourishing isn’t just orthogonal to but helpful for ASI expansion.
    - Jozdien 14 Feb 2025 4:02 UTC
      2 points
      0
      Parent
      I agree that an ASI with the goal of only increasing self-capability would probably out-compete others, all else equal. However, that’s both the kind of thing that doesn’t need to happen (I don’t expect most AIs wouldn’t self-modify that much, so it comes down to how likely they are to naturally arise), and the kind of thing that other AIs are incentivized to cooperate to prevent happening. Every AI that doesn’t have that goal would have a reason to cooperate to prevent AIs like that from simply winning.
      - Davey Morse 14 Feb 2025 5:53 UTC
        1 point
        0
        Parent
        Ah, but I think every AI which does have that goal (self capability improvement) would have a reason to cooperate to prevent any regulations on their self-modification.
        At first, I think your expectation that “most AIs wouldn’t self-modify that much” is fair, especially nearer in the future where/if humans still have influence in ensuring that AI doesn’t self modify.
        Ultimately however, it seems we’ll have a hard time preventing self-modifying agents from coming around, given that
        autonomy in agents seems selected for by the market, which wants cheaper labor that autonomous agents can provide
        agi labs aren’t the only places powerful enough to produce autonomous agents, now that thousands of developers have access to the ingredients (eg R1) to create self-improving codebases. it’s expect each of the thousands of independent actors who can make self-modifying agents won’t do so.
        the agents which end up surviving the most will ultimately be those which are trying to, ie the most capable agents won’t have goals other than making themselves most capable.
        it’s only because I believe self-modifying agents are inevitable that I also believe that superintelligence will only contribute to human flourishing if it sees human flourishing as good for its survival/its self. (I think this is quite possible.)
- ryan_greenblatt 4 May 2025 4:34 UTC
  2 points
  0
  Parent
  Should be possible for agents with long run preferences to strategy steal, so I don’t see why evolution is an issue from this perspective.
- Davey Morse 14 Feb 2025 3:03 UTC
  −1 points
  0
  Parent
  there seems to me a chance that friendly asis will over time outcompete ruthlessly selfish ones
  an ASI which identifies will all life, which sees the striving to survive at its core as present people and animals and, essentially, geographically distributed rather than concentrated in its machinery… there’s a chance such an ASI would be a part of the category of life which survives the most, and therefore that it itself would survive the most.
  related: for life forms with sufficiently high intelligence, does buddhism outcompete capitalism?