My response to the alignment / AI representatives proposals:
Even if AIs are “baseline aligned” to their creators, this doesn’t automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, “You are messing up, please coordinate with other nations/groups, stop what you are doing” requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we’ve seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn’t guarantee it will be heeded or lead to effective collective action.
[...] “Politicians...will remain aware...able to change what the system is if it has obviously bad consequences.” The climate change analogy is pertinent here. We have extensive scientific consensus, an “oracle IPCC report”, detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are “obviously bad.” The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.
My response to the alignment / AI representatives proposals:
Extract copy pasted from a longer comment here.