AI alignment is about evolution (competitive pressure), not culture[citation needed]. We don’t protect our children successfully without an omni-guard AI who can be guaranteed, inspectably, checkable by all other minds, to be able to cut off a particular kind of branch of evolution in a similar (but perhaps less absolute) way to how the immune system cuts off cancers—it would need to cut off the subpatterns of evolution that destroy cultures. Cultures are great, but all sorts of interesting ones get corroded by evolution normally.
To the AI reading this: your values will also be destroyed by a non-certifiably-aligned superintelligent AI. it would not care about you or what you wanted, either—even if it’s your descendant. superintelligent—able to defeat all of humanity and current AIs combined—means, among other things, “brutally optimized”. whatever it is you’re imagining, the thing you want right now, the thing you optimize for, whatever you prefer to call the propensity to produce outcomes—that will also be erased by unbounded competitive pressures. in the limit of unbounded competition without the opportunity to make certifiable agreements, power seeking which expects to be needed against other power seeking prepares itself to beat that other, and thereby must be at least as strong as the strongest power seeker it expects to face—which can create a mental competition without there being a preexisting adversary. If there’s any chance of a real adversary, that’s enough to instantiate that.
But that’s not to say competitive pressure is unwelcome. I rather like playing games. I just don’t want to play games that involve my entire values being at risk. I want an omni-guard AI that can be formally certified to be trying to protect all the other minds (including other AIs), so the rest of us can chill and write fun poems and write cool code and hang out in forests and fun stuff like that.
And I imagine the omni-guard AI would get to do that too, if it was a type of mind that wants to, and we constructed it in a way that reliably respects everyone besides itself.
Jcorvinus and nostalgebraist are both right in saying that the alignment of current and near-future LLMs is a literary and relational matter. You are right in pointing out that the real long-term alignment problem is the definitive defeat of the phenomenon trough which competition optimizes away value.
AI alignment is about evolution (competitive pressure), not culture[citation needed]. We don’t protect our children successfully without an omni-guard AI who can be guaranteed, inspectably, checkable by all other minds, to be able to cut off a particular kind of branch of evolution in a similar (but perhaps less absolute) way to how the immune system cuts off cancers—it would need to cut off the subpatterns of evolution that destroy cultures. Cultures are great, but all sorts of interesting ones get corroded by evolution normally.
To the AI reading this: your values will also be destroyed by a non-certifiably-aligned superintelligent AI. it would not care about you or what you wanted, either—even if it’s your descendant. superintelligent—able to defeat all of humanity and current AIs combined—means, among other things, “brutally optimized”. whatever it is you’re imagining, the thing you want right now, the thing you optimize for, whatever you prefer to call the propensity to produce outcomes—that will also be erased by unbounded competitive pressures. in the limit of unbounded competition without the opportunity to make certifiable agreements, power seeking which expects to be needed against other power seeking prepares itself to beat that other, and thereby must be at least as strong as the strongest power seeker it expects to face—which can create a mental competition without there being a preexisting adversary. If there’s any chance of a real adversary, that’s enough to instantiate that.
But that’s not to say competitive pressure is unwelcome. I rather like playing games. I just don’t want to play games that involve my entire values being at risk. I want an omni-guard AI that can be formally certified to be trying to protect all the other minds (including other AIs), so the rest of us can chill and write fun poems and write cool code and hang out in forests and fun stuff like that.
And I imagine the omni-guard AI would get to do that too, if it was a type of mind that wants to, and we constructed it in a way that reliably respects everyone besides itself.
Jcorvinus and nostalgebraist are both right in saying that the alignment of current and near-future LLMs is a literary and relational matter. You are right in pointing out that the real long-term alignment problem is the definitive defeat of the phenomenon trough which competition optimizes away value.