Charbel-Raphael Segerie
https://crsegerie.github.io/
Living in Paris
If there is a shortage of staff time, then AI safety funders need to hire more staff. If they don’t have time to hire more staff, then they need to hire headhunters to do so for them. If a grantee is running up against a budget crisis before the new grantmaking staff can be on-boarded, then funders can maintain the grantee’s program at present funding levels while they wait for their new staff to become available.
+1 - and this has been a problem for many years.
I find it slightly concerning that this post is not receiving more attention.
By the time we observe whether AI governance grants have been successful, it will be too late to change course.
I don’t understand this part. I think that it is possible to assess in much more granular detail the progress of some advocacy effort.
Strong upvote. A few complementary remarks:
Many more people agree on the risks than on the solutions—advocating for situational awareness of the different risks might be more productive and urgent than arguing for a particular policy, even though I also see the benefits of pushing for a policy.
The AI Safety movement is highly uncoordinated; everyone is pushing their own idea. By default, I think this might be negative—maybe we should coordinate better.
The list of orphaned policies could go on—for example, at CeSIA, we are more focused on formalizing what unacceptable risks would mean, and trying to trace precise red lines and risk thresholds. We think this approach is: 1) Most acceptable to states, since even rival countries have an interest in cooperating to prevent worst-case scenarios, as demonstrated by the Nuclear Non-Proliferation Treaty during the Cold War. 2) Most widely endorsed by research institutes, think tanks, and advocacy groups (and we think this might be a good candidate policy that should be pushed in a coalition). 3) Reasonable, as most AI companies have already voluntarily committed to these principles during the International AI Summit in Seoul. However, to date, the red lines have been largely vague and are not yet implementable.
P(doom|Anthropic builds AGI) is 15% and P(doom|some other company builds AGI) is 30% --> You need to add to this the probability that Anthropic is first and that the other companies are not going to create AGI if Anthropic already created it. this is by default not the case
I’m going to collect here new papers that might be relevant:
Why Do Some Language Models Fake Alignment While Others Don’t? (link)
I was thinking about this:
Perhaps this link is relevant: https://www.fanaticalfuturist.com/2024/12/ai-agents-created-a-minecraft-civilisation-complete-with-culture-religion-and-tax/ (it’s not a research paper, but neither you I think?)
Voyager is a single agent, but it’s very visual: https://voyager.minedojo.org/
OpenAI already did the hide-and-seek project a while ago: https://openai.com/index/emergent-tool-use/
While those are not examples of computer use, I think it fits the bill for a presentation of multi-agent capabilities in a visual way.
I’m happy to see that you are creating recaps for journalists and social media.
Regarding the comment on advocacy, “I think it also has some important epistemic challenges”: I’m not going to deny that in a highly optimized slide deck, you won’t have time to balance each argument. But also, does it matter that much? Rationality is winning, and to win, we need to be persuasive in a limited amount of time. I don’t have the time to also fix civilizational inadequacy regarding epistemics, so I play the game, as is doing the other side.
Also, I’m not criticizing the work itself, but rather the justification or goal. I think that if you did the goal factoring, you could optimize for this more directly.
Let’s chat in person !
I’m skeptical that this is the best way to achieve this goal, as many existing works already demonstrate these capabilities. Also, I think policymakers may struggle to connect these types of seemingly non-dangerous capabilities to AI risks. If I only had three minutes to pitch the case for AI safety, I wouldn’t use this work; I would primarily present some examples of scary demos.
Also, what you are doing is essentially capability research, which is not very neglected. There are already plenty of impressive capability papers that I could use for a presentation.
For info, here is the deck of slides that I generally use in different context.
I have considerable experience pitching to policymakers, and I’m very confident that my bottleneck in making my case isn’t a need for more experiments or papers, but rather more opportunities, more cold emails, and generally more advocacy.
I’m happy to jump on a call if you’d like to hear more about my perspective on what resonates with policymakers.
See also: We’re Not Advertising Enough.
What’s your theory of impact by doing this type of work?
We need to scale this massively. CeSIA is seriously considering to test the Direct Institutional Plan in France and in Europe.
Relatedly, I found the post We’re Not Advertising Enough very good, and making a similar point a bit more theoretically.
My response to the alignment / AI representatives proposals:
Even if AIs are “baseline aligned” to their creators, this doesn’t automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, “You are messing up, please coordinate with other nations/groups, stop what you are doing” requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we’ve seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn’t guarantee it will be heeded or lead to effective collective action.
[...] “Politicians...will remain aware...able to change what the system is if it has obviously bad consequences.” The climate change analogy is pertinent here. We have extensive scientific consensus, an “oracle IPCC report”, detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are “obviously bad.” The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.
Extract copy pasted from a longer comment here.
I find this pretty convincing.
The small amendment that I would make is that the space of policy options is quite vast and taking time to compare different options is probably not a bad idea, but I largely agree that it would generally be much better for people to move to the n-1 level.
That’s super interesting, thanks a lot for writing all of this.
I would agree if there remain humains after a biological catastrophe, I think that’s not a big deal and it’s easy to repopulate the planet.
I think it’s more tricky in the situation above, where most of the economy is run by AI, thought I’m really not sure of this
humanity’s potential is technically fulfilled if a random billionaire took control over earth and killed almost everyone
I find this quite disgusting personally
I think that his ‘very rich life’, and his sbires, would be a terrible impoverishment of human diversity and values. My mental image for this is something like Hitler in his bunker while AIs are terraforming the earth into an inhabitable place.
This is convincing!