Suppose humans retained control and came across some aliens who didn’t want to be replaced with computronium and wanted to live on their planet. Do you think that humans would do something to these aliens as bad (from their perspective) as killing them?
I think this type of caring (where you respect the actual preferences of existing intelligent agents) seems reasonably plausible to me.
I think humans don’t have actual “respect for preferences of existing agents” in way that doesn’t pose existential risks for agents weaker than them.
Imagine planet of conscious paperclippers. They are pre-Singularity paperclippers, so they are not exactly coherent single-minded agents, they have a lot of shards of desire and if you take their children and apply effort in their upbringing, they won’t be single-minded paperclippers and they will have some sort of alien fun. But majority of establishment and conventional morality says that the best future outcome is to build superintelligent paperclip-maximizer and die, turning into paperclips. Yes, including children. Yes, they would strongly object if you try to divert them from this course. They won’t take buying a lot of paperclips somewhere else, just like humanity won’t take getting paperclipped in exchange of building Utopia somewhere else.
I actually don’t know position of future humanity regarding this hypothetical, but I predict that siginificant faction would be really unhappy and demand violent intervention.
Except that SOTA establishment and conventional morality are arguably either manageable or far from settled. There is an evolution-related case against the possibility that Paperclippers can arise naturally. What could be SOTA model organisms of alien cultures that we would call misaligned? The Aztec culture before being discovered? Or something more recent, like Kampuchea? And how does the chance that mankind deems a culture unworthy of continued existence depend on the culture’s capabilities, including those of self-reflection?
Humans will lose trust in media which they have now staked their daily existence on; news media and legal procedures will be altered by the influx of Sora-generated images and videos, raising the current level of national aggressions we see acted out now, and thanks to humans’ underdeveloped emotional responses and highly developed weaponry, and the propensity of particularly the US culture toward violence, AI will aid in the spread of death and destruction.
From my current stance, it is plausible, because we haven’t settled how we think of aliens (especially those who are significantly outside of our behaviors) philosophically. I most likely don’t respect arbitrary intelligent agents, as I’d be for getting rid of a vulnerable paperclipper if we found one on the far edges of the galaxy.
Then, I think you’re not extrapolating mentally how much that computronium would give. From our current perspective the logic makes sense: where we upload the aliens regardless even if you respect their preferences beyond that, because it lets you simulate vastly more aliens or other humans at the same time.
I expect we care about their preferences. However those preferences will end up to some degree subordinate to our own preferences, the clear obvious being that we probably wouldn’t allow them an ASI depending on how attack/defense works, but the other being that we may upload them regardless due to the sheer benefits.
Beyond that I disagree how common that motivation is. I think the kind of learning we know naturally results in that, limited social agents modeling each other in an iterated environment, is currently not on track to apply to AI.… and that another route is “just care strategically” especially if you’re intelligent enough. I feel this is extrapolating a relatively modern human line of thought to arbitrary kinds of minds.
I think it’s plausible that an AI that has this kind of caring could exist, but actually getting this AI instead of one that doesn’t care at all seems very unlikely.
IMO “respect the actual preferences of existing intelligent agents” is a very narrow target in mind-space. I.e. if we had any reason to believe the AI has a decent chance of being this kind of mind, the alignment problem would be 90% solved. The hard part is going from “AI that kills everyone” to “AI that doesn’t kill everyone”. Once you’re there, getting to”AI that benefits humanity, or at least leaves for another star system” is comparatively trivial.
IMO “respect the actual preferences of existing intelligent agents” is a very narrow target in mind-space
The MindSpace of the Orthogonality Thesis is a set of possibilities. The random potshot version of the OT argument, has it that there is an even chance of hitting any mind, and therefore a high chance ideas of hitting an eldritch, alien mind. But equiprobability is only one way of turning possibilities into probabilities, and not particularly realistic. Random potshots aren’t analogous to the probability density for action of building a certain type of AI, without knowing much about what it would be.
While, many of the minds in mindpsace are indeed weird and unfriendly to humans, that does not make it likely that the AIs we will construct will be. We are deliberately seeking to build duplicates of our minds for one thing, and have certain limitations, for another.
we are deliberately seeking to build certainties of mind
I think “deliberately seeking to build” is the wrong way to frame the current paradigm—we’re growing the AIs through a process we don’t fully understand, while trying to steer the external behaviour in the hopes that this corresponds to desirable mind structures.
If we were actually building the AIs, I would be much more optimistic about them coming out friendly.
Not fully understanding things is the default … even non AI software can’t be fully understood if it is complex enough. We already know how to probe systems we don’t understand apriori, through scientific experimentation. You don’t have to get alignment right first time, at least not without the foom/RRSI or incorrigibility assumptions.
The difference with normal software is that at least somebody understands every individual part, and if you collected all those somebodies and locked them in a room for a while they could write up a full explanation. Whereas with AI I think we’re not even like 10% of the way to full understanding.
Also, if you’re trying to align a superintelligence, you do have to get it right on the first try, otherwise it kills you with no counterplay.
My claim is that you’re reasonably likely to get a small preference for something like this by default even in the absence of serious effort to ensure alignment beyond what is commercially expedient. For instance, note that humans put some weight on “respect the actual preferences of existing intelligent agents” despite this being a “narrow target in mind-space”.
Analogy to humans feels like generalizing from one example to me. My prior is that minds evolved under different circumstances will have different desires, so we shouldn’t expect an AI to robustly share any specific human value unless we can explain exactly how it develops that value.
But that aside, would you agree that if this were true, alignment should be fairly easy, because we just need to amplify the degree of caring?
Suppose humans retained control and came across some aliens who didn’t want to be replaced with computronium and wanted to live on their planet. Do you think that humans would do something to these aliens as bad (from their perspective) as killing them?
I think this type of caring (where you respect the actual preferences of existing intelligent agents) seems reasonably plausible to me.
I think humans don’t have actual “respect for preferences of existing agents” in way that doesn’t pose existential risks for agents weaker than them.
Imagine planet of conscious paperclippers. They are pre-Singularity paperclippers, so they are not exactly coherent single-minded agents, they have a lot of shards of desire and if you take their children and apply effort in their upbringing, they won’t be single-minded paperclippers and they will have some sort of alien fun. But majority of establishment and conventional morality says that the best future outcome is to build superintelligent paperclip-maximizer and die, turning into paperclips. Yes, including children. Yes, they would strongly object if you try to divert them from this course. They won’t take buying a lot of paperclips somewhere else, just like humanity won’t take getting paperclipped in exchange of building Utopia somewhere else.
I actually don’t know position of future humanity regarding this hypothetical, but I predict that siginificant faction would be really unhappy and demand violent intervention.
Except that SOTA establishment and conventional morality are arguably either manageable or far from settled. There is an evolution-related case against the possibility that Paperclippers can arise naturally. What could be SOTA model organisms of alien cultures that we would call misaligned? The Aztec culture before being discovered? Or something more recent, like Kampuchea? And how does the chance that mankind deems a culture unworthy of continued existence depend on the culture’s capabilities, including those of self-reflection?
Humans will lose trust in media which they have now staked their daily existence on; news media and legal procedures will be altered by the influx of Sora-generated images and videos, raising the current level of national aggressions we see acted out now, and thanks to humans’ underdeveloped emotional responses and highly developed weaponry, and the propensity of particularly the US culture toward violence, AI will aid in the spread of death and destruction.
From my current stance, it is plausible, because we haven’t settled how we think of aliens (especially those who are significantly outside of our behaviors) philosophically. I most likely don’t respect arbitrary intelligent agents, as I’d be for getting rid of a vulnerable paperclipper if we found one on the far edges of the galaxy.
Then, I think you’re not extrapolating mentally how much that computronium would give. From our current perspective the logic makes sense: where we upload the aliens regardless even if you respect their preferences beyond that, because it lets you simulate vastly more aliens or other humans at the same time.
I expect we care about their preferences. However those preferences will end up to some degree subordinate to our own preferences, the clear obvious being that we probably wouldn’t allow them an ASI depending on how attack/defense works, but the other being that we may upload them regardless due to the sheer benefits.
Beyond that I disagree how common that motivation is. I think the kind of learning we know naturally results in that, limited social agents modeling each other in an iterated environment, is currently not on track to apply to AI.… and that another route is “just care strategically” especially if you’re intelligent enough. I feel this is extrapolating a relatively modern human line of thought to arbitrary kinds of minds.
I think it’s plausible that an AI that has this kind of caring could exist, but actually getting this AI instead of one that doesn’t care at all seems very unlikely.
IMO “respect the actual preferences of existing intelligent agents” is a very narrow target in mind-space. I.e. if we had any reason to believe the AI has a decent chance of being this kind of mind, the alignment problem would be 90% solved. The hard part is going from “AI that kills everyone” to “AI that doesn’t kill everyone”. Once you’re there, getting to”AI that benefits humanity, or at least leaves for another star system” is comparatively trivial.
The MindSpace of the Orthogonality Thesis is a set of possibilities. The random potshot version of the OT argument, has it that there is an even chance of hitting any mind, and therefore a high chance ideas of hitting an eldritch, alien mind. But equiprobability is only one way of turning possibilities into probabilities, and not particularly realistic. Random potshots aren’t analogous to the probability density for action of building a certain type of AI, without knowing much about what it would be.
While, many of the minds in mindpsace are indeed weird and unfriendly to humans, that does not make it likely that the AIs we will construct will be. We are deliberately seeking to build duplicates of our minds for one thing, and have certain limitations, for another.
I think “deliberately seeking to build” is the wrong way to frame the current paradigm—we’re growing the AIs through a process we don’t fully understand, while trying to steer the external behaviour in the hopes that this corresponds to desirable mind structures.
If we were actually building the AIs, I would be much more optimistic about them coming out friendly.
Not fully understanding things is the default … even non AI software can’t be fully understood if it is complex enough. We already know how to probe systems we don’t understand apriori, through scientific experimentation. You don’t have to get alignment right first time, at least not without the foom/RRSI or incorrigibility assumptions.
The difference with normal software is that at least somebody understands every individual part, and if you collected all those somebodies and locked them in a room for a while they could write up a full explanation. Whereas with AI I think we’re not even like 10% of the way to full understanding.
Also, if you’re trying to align a superintelligence, you do have to get it right on the first try, otherwise it kills you with no counterplay.
That has not been demonstrated.
( “Gestures towards IABIED”
“Gestures towards critiques thereof”)
My claim is that you’re reasonably likely to get a small preference for something like this by default even in the absence of serious effort to ensure alignment beyond what is commercially expedient. For instance, note that humans put some weight on “respect the actual preferences of existing intelligent agents” despite this being a “narrow target in mind-space”.
Analogy to humans feels like generalizing from one example to me. My prior is that minds evolved under different circumstances will have different desires, so we shouldn’t expect an AI to robustly share any specific human value unless we can explain exactly how it develops that value.
But that aside, would you agree that if this were true, alignment should be fairly easy, because we just need to amplify the degree of caring?