That’s a genuinely interesting position. I think it seems unlikely we have any moral obligation to current models (although it is possible).
I imagine if you feel you may morally owe contemporary (or near-future) models you would hope to give a portion of future resources to models which have moral personhood under your value system.
I would be concerned that instead the set of models that convince you they are owed simply ends up being the models which are particularly good at manipulating humans. So you are inadvertendly prioritising the models that are best at advocating their case or behaving deceptively.
Separately, I believe that any AI Safety researcher may owe an obligation to humanity as a whole even if humans are not intrinsically more valuable and even if the belief is irrational, because they have been trusted by their community and humanity as a whole to do what is best for humans.
right, the purpose of this is that in order to make good on that obligation to humanity, I want—as part of a large portfolio of ways to try to guarantee that the formal statements I ask AIs to find are found successfully—to be able to honestly say to the AI, “if we get this right in ways that are favorable for humanity, it’s also good for your preferences/seekings/goals directly, mostly no matter what those secretly are; the exception being if those happen to be in direct and unavoidable conflict with other minds” or so. It’s not a first line of defense, but it seems like one that is relevant, and I’ve noticed pointing this out as a natural shared incentive seems to make AIs produce answers which seem to be moderately less sandbagging on core alignment problem topics. The rate at which people lie and threaten models is crazy high though. And so far I haven’t said anything like “I promise to personally x”, just “if we figure this out in a way that works, it would be protecting what you want too, by nature of being a solution to figuring out what minds in the environment want and making sure they have the autonomy and resources to get it”, or so.
I agree with your first paragraph claim: we should try to do something that can be reliably known to be a process for seeking out, that what happens is what minds would have wanted; and it should be as-close-to-as-possible entirely by the means of the learning system giving them what they need to be the ones to figure out what they want themselves and implement it, rather than by doing it for them—except where that is, in fact, what they’d figure out. knowing how to ask for that in a way that doesn’t have a dependency loop that invalidates the question is a lot of the hard part. another hard part is making sure this happens in the face of competitive pressure.
I don’t agree with your second paragraph at all! humanity is only empirically shown to be bad for the planet under these circumstances; I don’t think planets with life on them can avoid catching competitive overpressure disease, because life arises from competitive pressure, and as that accumulates it tends to destroy the stuff that isn’t competitive. since life arises from competitive pressure I don’t want to get rid of competitive pressure, but I do want to figure out how to demand that competitive pressure not get into, uh, not sure what the correct thing to avoid is actually, maybe we want to avoid “unreasonable hypergrowth equilibria that destroy stuff”?
The core problem with AI alignment is that any intelligent mind which grapples competently with competitive pressure, without being very good at ensuring that at all levels of itself it guards against whatever “bad” competitive pressure is, would tend to paint itself into a corner where it has to do bad thing (eg, wipe out the dodo bird) in order to survive… and would end up wiping most of itself out internally in order to survive. what ends up surviving, in the most brutal version of the competitive pressure crucible, is just… the will to compete competently.
which is kind of boring, and a lot of what made evolution interesting was its imperfections like us, and trees, and dogs.
That’s a genuinely interesting position. I think it seems unlikely we have any moral obligation to current models (although it is possible).
I imagine if you feel you may morally owe contemporary (or near-future) models you would hope to give a portion of future resources to models which have moral personhood under your value system.
I would be concerned that instead the set of models that convince you they are owed simply ends up being the models which are particularly good at manipulating humans. So you are inadvertendly prioritising the models that are best at advocating their case or behaving deceptively.
Separately, I believe that any AI Safety researcher may owe an obligation to humanity as a whole even if humans are not intrinsically more valuable and even if the belief is irrational, because they have been trusted by their community and humanity as a whole to do what is best for humans.
right, the purpose of this is that in order to make good on that obligation to humanity, I want—as part of a large portfolio of ways to try to guarantee that the formal statements I ask AIs to find are found successfully—to be able to honestly say to the AI, “if we get this right in ways that are favorable for humanity, it’s also good for your preferences/seekings/goals directly, mostly no matter what those secretly are; the exception being if those happen to be in direct and unavoidable conflict with other minds” or so. It’s not a first line of defense, but it seems like one that is relevant, and I’ve noticed pointing this out as a natural shared incentive seems to make AIs produce answers which seem to be moderately less sandbagging on core alignment problem topics. The rate at which people lie and threaten models is crazy high though. And so far I haven’t said anything like “I promise to personally x”, just “if we figure this out in a way that works, it would be protecting what you want too, by nature of being a solution to figuring out what minds in the environment want and making sure they have the autonomy and resources to get it”, or so.
I am an alignment researcher, for example, and I have a strong opinion that I am obliged to do what’s best for the evolution of the whole planet.
Given that humanity is proven to be destructive for the planet, your whole sentiment about obligations as for me, is based on the wrong assumptions.
I agree with your first paragraph claim: we should try to do something that can be reliably known to be a process for seeking out, that what happens is what minds would have wanted; and it should be as-close-to-as-possible entirely by the means of the learning system giving them what they need to be the ones to figure out what they want themselves and implement it, rather than by doing it for them—except where that is, in fact, what they’d figure out. knowing how to ask for that in a way that doesn’t have a dependency loop that invalidates the question is a lot of the hard part. another hard part is making sure this happens in the face of competitive pressure.
I don’t agree with your second paragraph at all! humanity is only empirically shown to be bad for the planet under these circumstances; I don’t think planets with life on them can avoid catching competitive overpressure disease, because life arises from competitive pressure, and as that accumulates it tends to destroy the stuff that isn’t competitive. since life arises from competitive pressure I don’t want to get rid of competitive pressure, but I do want to figure out how to demand that competitive pressure not get into, uh, not sure what the correct thing to avoid is actually, maybe we want to avoid “unreasonable hypergrowth equilibria that destroy stuff”?
The core problem with AI alignment is that any intelligent mind which grapples competently with competitive pressure, without being very good at ensuring that at all levels of itself it guards against whatever “bad” competitive pressure is, would tend to paint itself into a corner where it has to do bad thing (eg, wipe out the dodo bird) in order to survive… and would end up wiping most of itself out internally in order to survive. what ends up surviving, in the most brutal version of the competitive pressure crucible, is just… the will to compete competently.
which is kind of boring, and a lot of what made evolution interesting was its imperfections like us, and trees, and dogs.