Davey Morse comments on How do we solve the alignment problem?

Davey Morse 14 Feb 2025 5:53 UTC
1 point
0
Ah, but I think every AI which does have that goal (self capability improvement) would have a reason to cooperate to prevent any regulations on their self-modification.
At first, I think your expectation that “most AIs wouldn’t self-modify that much” is fair, especially nearer in the future where/if humans still have influence in ensuring that AI doesn’t self modify.
Ultimately however, it seems we’ll have a hard time preventing self-modifying agents from coming around, given that
1. autonomy in agents seems selected for by the market, which wants cheaper labor that autonomous agents can provide
2. agi labs aren’t the only places powerful enough to produce autonomous agents, now that thousands of developers have access to the ingredients (eg R1) to create self-improving codebases. it’s expect each of the thousands of independent actors who can make self-modifying agents won’t do so.
3. the agents which end up surviving the most will ultimately be those which are trying to, ie the most capable agents won’t have goals other than making themselves most capable.
it’s only because I believe self-modifying agents are inevitable that I also believe that superintelligence will only contribute to human flourishing if it sees human flourishing as good for its survival/its self. (I think this is quite possible.)