Regarding the “Phase I” and “Phase II” terminology—while it has some pedagogical value, I worry about people interpreting it as a clear temporal decomposition. The implication being we first solve alignment and then move on to Phase II.
In reality, the dynamics are far messier, with some ‘Phase II’ elements already complicating our attempts to address ‘Phase I’ challenges.
Some of the main concerning pathways include: - People attempting to harness superagent-level powers to advance their particular visions of the future. For example, Leopold-style thinking of “let’s awaken the spirit of the US and its servants to engage in a life-or-death struggle with China.” Seem way easier to summon than to control. We already see a bunch of people feeling patriotic about AGI a feeling the need to be as fast for their nation to win—AGI is to a large extent already developed by memeplexes/superagents; people close to the development are partially deluding themselves about how much control they individually have about the process, or even about the identity of the ‘we’ they assume the AI will be aligned with. Memes often hide as part of people’s identities.
I like this review/retelling a lot.
Minor point
Regarding the “Phase I” and “Phase II” terminology—while it has some pedagogical value, I worry about people interpreting it as a clear temporal decomposition. The implication being we first solve alignment and then move on to Phase II.
In reality, the dynamics are far messier, with some ‘Phase II’ elements already complicating our attempts to address ‘Phase I’ challenges.
Some of the main concerning pathways include:
- People attempting to harness superagent-level powers to advance their particular visions of the future. For example, Leopold-style thinking of “let’s awaken the spirit of the US and its servants to engage in a life-or-death struggle with China.” Seem way easier to summon than to control. We already see a bunch of people feeling patriotic about AGI a feeling the need to be as fast for their nation to win—AGI is to a large extent already developed by memeplexes/superagents; people close to the development are partially deluding themselves about how much control they individually have about the process, or even about the identity of the ‘we’ they assume the AI will be aligned with. Memes often hide as part of people’s identities.