In ‘Emergent’ misaligned outcomes’ I was trying to gesture at something like the selection effects on non-human, made-of-humans-for-now systems, and how they might incorporate AI capabilities (even non-agentic or corrigible agentic ones) and become ‘more than merely/mainly selected’ at some point.
Sometimes I find myself reaching for phrases like ‘non-agentic AGI embedded in an economic context’. I’m trying to express a fuzzier but maybe more concerning concept that the political or economic environment itself might contain non-human goal-directed or optimising pressures/forces, and particular tyrannical humans or human organisations are just some of the outputs of those forces.
Such an optimising force, since inhuman, is liable to be misaligned by default. In fact this more general version of the argument may apply even if we make corrigible agentic AGI—perhaps obvious in hindsight, this has only become clear to me as I write.
We could suggestively call such an emergent goal-directed system a ‘miscoordination demon’. I would claim that such systems are already substantially reducing the amount of value in the world and will plausibly continue to do so with or without the introduction of AGI. If the introduction of AGI differentially empowers human agency vs such miscoordination demons, we could imagine either being able to subdue them (perhaps permanently), or being subdued by them (perhaps permanently).
I think this is something like a garbled proto gradual disempowerment perspective, and I appreciate the crisper articulation here and in your other writing.
In ‘Emergent’ misaligned outcomes’ I was trying to gesture at something like the selection effects on non-human, made-of-humans-for-now systems, and how they might incorporate AI capabilities (even non-agentic or corrigible agentic ones) and become ‘more than merely/mainly selected’ at some point.
I think this is something like a garbled proto gradual disempowerment perspective, and I appreciate the crisper articulation here and in your other writing.