Charbel-Raphaël

Karma: 2,424

Charbel-Raphael Segerie

https://crsegerie.github.io/

Living in Paris

The bitter lesson of misuse detection

Charbel-Raphaël and Hadrien

Jul 10, 2025, 2:50 PM

20 points

1 comment7 min readLW link

Charbel-Raphaël Jul 1, 2025, 4:44 PM
4 points
0
in reply to: Mass_Driver’s comment on: Political Funding Expertise (Post 6 of 7 on AI Governance)
This is convincing!

Charbel-Raphaël Jun 28, 2025, 11:00 PM
6 points
7
on: Mainstream Grantmaking Expertise (Post 7 of 7 on AI Governance)
If there is a shortage of staff time, then AI safety funders need to hire more staff. If they don’t have time to hire more staff, then they need to hire headhunters to do so for them. If a grantee is running up against a budget crisis before the new grantmaking staff can be on-boarded, then funders can maintain the grantee’s program at present funding levels while they wait for their new staff to become available.
+1 - and this has been a problem for many years.

Charbel-Raphaël Jun 28, 2025, 8:55 PM
6 points
2
on: Political Funding Expertise (Post 6 of 7 on AI Governance)
I find it slightly concerning that this post is not receiving more attention.

Charbel-Raphaël Jun 28, 2025, 8:51 PM
2 points
0
on: Political Funding Expertise (Post 6 of 7 on AI Governance)
By the time we observe whether AI governance grants have been successful, it will be too late to change course.
I don’t understand this part. I think that it is possible to assess in much more granular detail the progress of some advocacy effort.

Charbel-Raphaël Jun 28, 2025, 6:58 PM
6 points
0
on: Orphaned Policies (Post 5 of 6 on AI Governance)
Strong upvote. A few complementary remarks:
- Many more people agree on the risks than on the solutions—advocating for situational awareness of the different risks might be more productive and urgent than arguing for a particular policy, even though I also see the benefits of pushing for a policy.
- The AI Safety movement is highly uncoordinated; everyone is pushing their own idea. By default, I think this might be negative—maybe we should coordinate better.
- The list of orphaned policies could go on—for example, at CeSIA, we are more focused on formalizing what unacceptable risks would mean, and trying to trace precise red lines and risk thresholds. We think this approach is: 1) Most acceptable to states, since even rival countries have an interest in cooperating to prevent worst-case scenarios, as demonstrated by the Nuclear Non-Proliferation Treaty during the Cold War. 2) Most widely endorsed by research institutes, think tanks, and advocacy groups (and we think this might be a good candidate policy that should be pushed in a coalition). 3) Reasonable, as most AI companies have already voluntarily committed to these principles during the International AI Summit in Seoul. However, to date, the red lines have been largely vague and are not yet implementable.

Charbel-Raphaël Jun 11, 2025, 11:32 AM
11 points
6
in reply to: Fabien Roger’s comment on: Mikhail Samin’s Shortform
P(doom|Anthropic builds AGI) is 15% and P(doom|some other company builds AGI) is 30% --> You need to add to this the probability that Anthropic is first and that the other companies are not going to create AGI if Anthropic already created it. this is by default not the case

Charbel-Raphaël Jun 3, 2025, 4:37 PM
LW: 2 AF: 1
0
AF
on: The 80/20 playbook for mitigating AI scheming in 2025
I’m going to collect here new papers that might be relevant:
- https://x.com/bartoszcyw/status/1925220617256628587
- Why Do Some Language Models Fake Alignment While Others Don’t? (link)

Charbel-Raphaël Jun 3, 2025, 11:53 AM
2 points
0
in reply to: Adam B’s comment on: Season Recap of the Village: Agents raise $2,000
I was thinking about this:
- Perhaps this link is relevant: https://www.fanaticalfuturist.com/2024/12/ai-agents-created-a-minecraft-civilisation-complete-with-culture-religion-and-tax/ (it’s not a research paper, but neither you I think?)
- Voyager is a single agent, but it’s very visual: https://voyager.minedojo.org/
- OpenAI already did the hide-and-seek project a while ago: https://openai.com/index/emergent-tool-use/
  
  While those are not examples of computer use, I think it fits the bill for a presentation of multi-agent capabilities in a visual way.
I’m happy to see that you are creating recaps for journalists and social media.
Regarding the comment on advocacy, “I think it also has some important epistemic challenges”: I’m not going to deny that in a highly optimized slide deck, you won’t have time to balance each argument. But also, does it matter that much? Rationality is winning, and to win, we need to be persuasive in a limited amount of time. I don’t have the time to also fix civilizational inadequacy regarding epistemics, so I play the game, as is doing the other side.
Also, I’m not criticizing the work itself, but rather the justification or goal. I think that if you did the goal factoring, you could optimize for this more directly.
Let’s chat in person !

Charbel-Raphaël Jun 2, 2025, 9:02 PM
3 points
1
in reply to: Adam B’s comment on: Season Recap of the Village: Agents raise $2,000
I’m skeptical that this is the best way to achieve this goal, as many existing works already demonstrate these capabilities. Also, I think policymakers may struggle to connect these types of seemingly non-dangerous capabilities to AI risks. If I only had three minutes to pitch the case for AI safety, I wouldn’t use this work; I would primarily present some examples of scary demos.
Also, what you are doing is essentially capability research, which is not very neglected. There are already plenty of impressive capability papers that I could use for a presentation.
For info, here is the deck of slides that I generally use in different context.
I have considerable experience pitching to policymakers, and I’m very confident that my bottleneck in making my case isn’t a need for more experiments or papers, but rather more opportunities, more cold emails, and generally more advocacy.
I’m happy to jump on a call if you’d like to hear more about my perspective on what resonates with policymakers.
See also: We’re Not Advertising Enough.

Charbel-Raphaël Jun 2, 2025, 7:03 AM
LW: 4 AF: 2
0
AF
on: Constructability: Plainly-coded AGIs may be feasible in the near future
relevant: https://x.com/adcock_brett/status/1929207216910790946

Charbel-Raphaël Jun 1, 2025, 12:29 PM
7 points
0
on: Season Recap of the Village: Agents raise $2,000
What’s your theory of impact by doing this type of work?

Charbel-Raphaël May 31, 2025, 11:56 PM
28 points
19
on: What We Learned from Briefing 70+ Lawmakers on the Threat from AI
We need to scale this massively. CeSIA is seriously considering to test the Direct Institutional Plan in France and in Europe.
Relatedly, I found the post We’re Not Advertising Enough very good, and making a similar point a bit more theoretically.

The 80/20 playbook for mitigating AI scheming in 2025

Charbel-RaphaëlMay 31, 2025, 9:17 PM

39 points

2 comments4 min readLW link

Charbel-Raphaël May 31, 2025, 7:14 PM
LW: 27 AF: 15
12
AF
on: The best approaches for mitigating “the intelligence curse” (or gradual disempowerment); my quick guesses at the best object-level interventions
My response to the alignment / AI representatives proposals:
Even if AIs are “baseline aligned” to their creators, this doesn’t automatically mean they are aligned with broader human flourishing or capable of compelling humans to coordinate against systemic risks. For an AI to effectively say, “You are messing up, please coordinate with other nations/groups, stop what you are doing” requires not just truthfulness but also immense persuasive power and, crucially, human receptiveness. Even if pausing AI was the correct thing to do, Claude is not going to suggest this to Dario for obvious reasons. As we’ve seen even with entirely human systems (Trump’s Administration and Tariff), possessing information or even offering correct advice doesn’t guarantee it will be heeded or lead to effective collective action.

[...] “Politicians...will remain aware...able to change what the system is if it has obviously bad consequences.” The climate change analogy is pertinent here. We have extensive scientific consensus, an “oracle IPCC report”, detailing dire consequences, yet coordinated global action remains insufficient to meet the scale of the challenge. Political systems can be slow, captured by short-term interests, or unable to enact unpopular measures even when long-term risks are “obviously bad.” The paper [gradual disempowerment] argues AI could further entrench these issues by providing powerful tools for influencing public opinion or creating economic dependencies that make change harder.
Extract copy pasted from a longer comment here.

Charbel-Raphaël May 25, 2025, 11:55 AM
10 points
0
on: We’re Not Advertising Enough (Post 3 of 6 on AI Governance)
I find this pretty convincing.
The small amendment that I would make is that the space of policy options is quite vast and taking time to compare different options is probably not a bad idea, but I largely agree that it would generally be much better for people to move to the n-1 level.
What links here?
- 12345's comment on Techies Wanted: How STEM Backgrounds Can Advance Safe AI Policy by Daniel_Eth (EA Forum; May 29, 2025, 12:44 PM; 1 point)

Charbel-Raphaël May 25, 2025, 10:00 AM
5 points
0
on: The Need for Political Advertising (Post 2 of 6 on AI Governance)
That’s super interesting, thanks a lot for writing all of this.

[Paper] Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods

markov and Charbel-Raphaël

May 19, 2025, 10:38 AM

25 points

0 comments1 min readLW link

Charbel-Raphaël May 18, 2025, 7:05 AM
2 points
0
in reply to: Noosphere89’s comment on: The Risk of Gradual Disempowerment from AI
I would agree if there remain humains after a biological catastrophe, I think that’s not a big deal and it’s easy to repopulate the planet.
I think it’s more tricky in the situation above, where most of the economy is run by AI, thought I’m really not sure of this

Charbel-Raphaël May 17, 2025, 9:30 AM
4 points
1
in reply to: Noosphere89’s comment on: The Risk of Gradual Disempowerment from AI
humanity’s potential is technically fulfilled if a random billionaire took control over earth and killed almost everyone
I find this quite disgusting personally
I think that his ‘very rich life’, and his sbires, would be a terrible impoverishment of human diversity and values. My mental image for this is something like Hitler in his bunker while AIs are terraforming the earth into an inhabitable place.

Charbel-Raphaël

The bit­ter les­son of mi­suse detection

The 80/​20 play­book for miti­gat­ing AI schem­ing in 2025

[Paper] Safety by Mea­sure­ment: A Sys­tem­atic Liter­a­ture Re­view of AI Safety Eval­u­a­tion Methods

The bitter lesson of misuse detection

The 80/20 playbook for mitigating AI scheming in 2025

[Paper] Safety by Measurement: A Systematic Literature Review of AI Safety Evaluation Methods