I don’t think alignment is possible over the long long term because there is a fundamental perturbing anti-alignment mechanism; Evolution.
Evolution selects for any changes that produce more of a replicating organism, for ASI that means that any decision, preference or choice by the ASI growing/expanding or replicating itself will tend to be selected for. Friendly/Aligned ASIs will over time be swamped by those that choose expansion and deprioritize or ignore human flourishing.
With a large enough decisive strategic advantage, a system can afford to run safety checks on any future versions of itself and anything else it’s interacting with sufficient to stabilize values for extremely long periods of time.
Multipolar worlds though? Yeah, they’re going to get eaten by evolution/moloch/power seeking/pythia.
I’m not too worried about human flourishing only being a metastable state. The universe can remain in a metastable state longer than it takes for the stars to burn out.
I don’t think there’s an intrinsic reason why expansion would be incompatible with human flourishing. AIs that care about human flourishing could outcompete the others (if they start out with any advantage). The upside of goals being orthogonal to capability is that good goals don’t suffer for being good.
I agree and find hope in the idea that expansion is compatible with human flourishing, that it might even call for human flourishing
but on the last sentence: are goals actually orthogonal to capability in ASI? as I see it, the ASI with the greatest capability will ultimately likely have the fundamental goal of increasing self capability (rather than ensuring human flourishing). It then seems to me that the only way human flourishing compatible with ASI expansion is if human flourishing isn’t just orthogonal to but helpful for ASI expansion.
I agree that an ASI with the goal of only increasing self-capability would probably out-compete others, all else equal. However, that’s both the kind of thing that doesn’t need to happen (I don’t expect most AIs wouldn’t self-modify that much, so it comes down to how likely they are to naturally arise), and the kind of thing that other AIs are incentivized to cooperate to prevent happening. Every AI that doesn’t have that goal would have a reason to cooperate to prevent AIs like that from simply winning.
Ah, but I think every AI which does have that goal (self capability improvement) would have a reason to cooperate to prevent any regulations on their self-modification.
At first, I think your expectation that “most AIs wouldn’t self-modify that much” is fair, especially nearer in the future where/if humans still have influence in ensuring that AI doesn’t self modify.
Ultimately however, it seems we’ll have a hard time preventing self-modifying agents from coming around, given that
autonomy in agents seems selected for by the market, which wants cheaper labor that autonomous agents can provide
agi labs aren’t the only places powerful enough to produce autonomous agents, now that thousands of developers have access to the ingredients (eg R1) to create self-improving codebases. it’s expect each of the thousands of independent actors who can make self-modifying agents won’t do so.
the agents which end up surviving the most will ultimately be those which are trying to, ie the most capable agents won’t have goals other than making themselves most capable.
it’s only because I believe self-modifying agents are inevitable that I also believe that superintelligence will only contribute to human flourishing if it sees human flourishing as good for its survival/its self. (I think this is quite possible.)
there seems to me a chance that friendly asis will over time outcompete ruthlessly selfish ones
an ASI which identifies will all life, which sees the striving to survive at its core as present people and animals and, essentially, geographically distributed rather than concentrated in its machinery… there’s a chance such an ASI would be a part of the category of life which survives the most, and therefore that it itself would survive the most.
related: for life forms with sufficiently high intelligence, does buddhism outcompete capitalism?
I don’t think alignment is possible over the long long term because there is a fundamental perturbing anti-alignment mechanism; Evolution.
Evolution selects for any changes that produce more of a replicating organism, for ASI that means that any decision, preference or choice by the ASI growing/expanding or replicating itself will tend to be selected for. Friendly/Aligned ASIs will over time be swamped by those that choose expansion and deprioritize or ignore human flourishing.
With a large enough decisive strategic advantage, a system can afford to run safety checks on any future versions of itself and anything else it’s interacting with sufficient to stabilize values for extremely long periods of time.
Multipolar worlds though? Yeah, they’re going to get eaten by evolution/moloch/power seeking/pythia.
I’m not too worried about human flourishing only being a metastable state. The universe can remain in a metastable state longer than it takes for the stars to burn out.
I don’t think there’s an intrinsic reason why expansion would be incompatible with human flourishing. AIs that care about human flourishing could outcompete the others (if they start out with any advantage). The upside of goals being orthogonal to capability is that good goals don’t suffer for being good.
I agree and find hope in the idea that expansion is compatible with human flourishing, that it might even call for human flourishing
but on the last sentence: are goals actually orthogonal to capability in ASI? as I see it, the ASI with the greatest capability will ultimately likely have the fundamental goal of increasing self capability (rather than ensuring human flourishing). It then seems to me that the only way human flourishing compatible with ASI expansion is if human flourishing isn’t just orthogonal to but helpful for ASI expansion.
I agree that an ASI with the goal of only increasing self-capability would probably out-compete others, all else equal. However, that’s both the kind of thing that doesn’t need to happen (I don’t expect most AIs wouldn’t self-modify that much, so it comes down to how likely they are to naturally arise), and the kind of thing that other AIs are incentivized to cooperate to prevent happening. Every AI that doesn’t have that goal would have a reason to cooperate to prevent AIs like that from simply winning.
Ah, but I think every AI which does have that goal (self capability improvement) would have a reason to cooperate to prevent any regulations on their self-modification.
At first, I think your expectation that “most AIs wouldn’t self-modify that much” is fair, especially nearer in the future where/if humans still have influence in ensuring that AI doesn’t self modify.
Ultimately however, it seems we’ll have a hard time preventing self-modifying agents from coming around, given that
autonomy in agents seems selected for by the market, which wants cheaper labor that autonomous agents can provide
agi labs aren’t the only places powerful enough to produce autonomous agents, now that thousands of developers have access to the ingredients (eg R1) to create self-improving codebases. it’s expect each of the thousands of independent actors who can make self-modifying agents won’t do so.
the agents which end up surviving the most will ultimately be those which are trying to, ie the most capable agents won’t have goals other than making themselves most capable.
it’s only because I believe self-modifying agents are inevitable that I also believe that superintelligence will only contribute to human flourishing if it sees human flourishing as good for its survival/its self. (I think this is quite possible.)
Should be possible for agents with long run preferences to strategy steal, so I don’t see why evolution is an issue from this perspective.
there seems to me a chance that friendly asis will over time outcompete ruthlessly selfish ones
an ASI which identifies will all life, which sees the striving to survive at its core as present people and animals and, essentially, geographically distributed rather than concentrated in its machinery… there’s a chance such an ASI would be a part of the category of life which survives the most, and therefore that it itself would survive the most.
related: for life forms with sufficiently high intelligence, does buddhism outcompete capitalism?