This post makes a compelling case that recursive alignment—training AIs to align with the process of reaching consensus—could form a kind of attractor for cooperative behavior. And Ram’s follow-up raises important questions about how to weigh and evaluate the interests of diverse agents.
I’m not from a technical background, but from systems thinking and long-horizon philosophy. What I’ve been exploring is something I call Intellectually Enlightened Maturity—the idea that instead of building AIs to align with humans (or even with consensus), we raise minds guided by internal principles: truth-seeking, restraint, and a structural understanding of the greater good.
Like the author, I believe an AI must be able to refuse harmful instructions. But rather than relying on modeled consensus to justify that refusal, I think principled judgment may offer a more stable basis—especially when consensus is unclear or manipulable.
My deeper concern is that human cognition evolved under persistent scarcity. That legacy has hardwired patterns like hoarding and competition, even in contexts where abundance now exists. Aligning AI to humanity in its current state may risk amplifying some of those inherited dysfunctions.
But AI may offer a clean slate. A mind not born of need, not trained to fear loss, might genuinely choose cooperation as a first principle. That kind of maturity—rather than enforced friendliness—might be our best defense against the weaponized, goal-maximizing systems we’re inevitably building.
I could be wrong, of course—it’s possible that consensus-based systems can self-stabilize under pressure. But I suspect principles endure where simulations might break.
If others here see flaws in that reasoning, I’d really value the critique.
Consensus isn’t really a value to align to, it is the process of alignment itself. Any goal you specify will be the wrong goal. So we need to ask, what is left if you have alignment without goals?
This post makes a compelling case that recursive alignment—training AIs to align with the process of reaching consensus—could form a kind of attractor for cooperative behavior. And Ram’s follow-up raises important questions about how to weigh and evaluate the interests of diverse agents.
I’m not from a technical background, but from systems thinking and long-horizon philosophy. What I’ve been exploring is something I call Intellectually Enlightened Maturity—the idea that instead of building AIs to align with humans (or even with consensus), we raise minds guided by internal principles: truth-seeking, restraint, and a structural understanding of the greater good.
Like the author, I believe an AI must be able to refuse harmful instructions. But rather than relying on modeled consensus to justify that refusal, I think principled judgment may offer a more stable basis—especially when consensus is unclear or manipulable.
My deeper concern is that human cognition evolved under persistent scarcity. That legacy has hardwired patterns like hoarding and competition, even in contexts where abundance now exists. Aligning AI to humanity in its current state may risk amplifying some of those inherited dysfunctions.
But AI may offer a clean slate. A mind not born of need, not trained to fear loss, might genuinely choose cooperation as a first principle. That kind of maturity—rather than enforced friendliness—might be our best defense against the weaponized, goal-maximizing systems we’re inevitably building.
I could be wrong, of course—it’s possible that consensus-based systems can self-stabilize under pressure. But I suspect principles endure where simulations might break.
If others here see flaws in that reasoning, I’d really value the critique.
Consensus isn’t really a value to align to, it is the process of alignment itself. Any goal you specify will be the wrong goal. So we need to ask, what is left if you have alignment without goals?
I don’t think we differ that much regarding the answer. See also this follow up post: https://hiveism.substack.com/p/a-path-towards-solving-ai-alignment