Jonathan Claybrough

Karma: 724

Software engineer transitioned into AI safety, teaching and strategy. Particularly interested in psychology, game theory, system design, economics.

Jonathan Claybrough 16 Feb 2026 17:00 UTC
6 points
1
in reply to: clone of saturn’s comment on: A multi-level postmortem of how our whole house got badly poisoned
I’ll answer this for fear that you leave with the wrong conclusion.
I am one of the two people who disregarded the partially open exhaust pipe. I have a french engineering school education and am competent at levels quite far beyond what The Way Things Work might cover, ie. I can formally and correctly model the whole system including chemistry (with one important caveat, I didn’t know top of my head that multiple-burning cycles with air reuse will lead to CO, and if that’s covered in The Way Things Work than I would have benefited), installation, material wear, safety engineering and process design.
I think a more appropriate conclusion is that knowledge and ability to model the world doesn’t automatically translate to using these skills. Saliency matters, and the house being rickety made shoddy things seem normal, not worth doing anything about. I was going by to reset the heater, a routine every few days activity, one can imagine I had other things in mind. I noticed that the air smelt bad and the pipe was angled and not sealed, maybe emotionally I had as strong a reaction as “Huh, was that pipe always like that?” before continuing with my day.
The one liner I would impart: “Some things, including heaters, are dangerous when improperly maintained, so practice feeling it as important and dangerous when you notice something off about it”. Proper actions follow from proper emotional reactions.

Jonathan Claybrough 2 Nov 2025 19:02 UTC
6 points
0
in reply to: habryka’s comment on: Should you donate to Lightcone Infrastructure?
I’m not asking you to engage with Mikhail more, I believe I understand it’s frustrating given your extensive prior conversations that still led to this post being made.
Nevertheless, I have found all these comments informative as well as op.
The post says Mikhail sent
Also, apparently, just in case, please don’t act on me having told you that [third party] are planning to do [thing] outside of this enabling you to chat to/coordinate with them)
and that you replied “lol, no” after a week.
I generally don’t want to clash with you as I respect a lot of your public takes etc, but for the same reasons you’re publicly disagreeable I do think it’s worth pointing my disagreement here. Unless you were already on colloquial terms with Mikhail, I find it rude you’d answer “lol no” to that specific request, notably given it used “please”. Even if it was an unreasonable ask, a “sorry but no” would have sufficed.
At the object level, as board member of enais and french centre for ai safety, I don’t even take Mikhail’s message as a surprising or unreasonable ask, unless interpreted stringently. Ofc if the formulation was “please make sure to act indistinguishably, even when assessed by a future superintelligence, on this info”, then a lol no is fine, but if Mikhail sent me this message I’d interpret it as asking the ⁸⁰⁄₂₀ reasonable effort and say I broadly agree tho will take other stuff into account too. In fact I overall guess (or want to believe) that you in fact do broadly agree (you prefer for people to tell you stuff, and soft commit to not indirectly harming people who tell you stuff if you can avoid it) and that you and Mikhail disagree because you’re interpreting Mikhail to be dogmatic and strict about his ask (which might actually be the case).
As an interested third party who generally would like to to work with LightConeInfra and you, unrelated to Mikhail’s specific asks, I’m curious for if you broadly agree to put some non trivial decision weight on not using info people give you in ways they strongly disagree with, even if they didn’t ask you to precomit to that, even if they were mistaken in some assumptions. (If you later get that info from other places you’re ~released from the first obligations, tho this shouldn’t be gamed)

Jonathan Claybrough 2 Nov 2025 18:19 UTC
−2 points
−18
in reply to: habryka’s comment on: Should you donate to Lightcone Infrastructure?
lw logistics thing, I’m annoyed that various replies are down voted rather than disagree voted,- even if readers find their tone not to lw standard they’re an important enough exchange within the context of the original post and these threads they shouldn’t get close to automatically hidden by being in negatives. Disageeevote or comment saying the tone is bad but keep these positive so future readers can find these easily

Jonathan Claybrough 30 Sep 2025 8:37 UTC
10 points
5
on: Reasons to sell frontier lab equity to donate now rather than later
My quick impression is that strong aligned talent is still hard to find and it’s worth increasing the funding now so that there will be talented people to fill in the demand coming in a few years. Fieldbuilding still matters a lot, especially high quality top end of pipeline, especially towards governance positions.

Jonathan Claybrough 30 Sep 2025 2:17 UTC
4 points
0
on: AI Safety Field Growth Analysis 2025
Thanks for this work.
In your technical dataset apart research appears twice, with 10 and 40 FTEs listed respectively, is that intentional? Is it meant to track the core team vs volunteers participating in hackathons etc?
Can you say a bit more about how these numbers are estimated? eg. 1 person looks at the websites and writes down how many they see, estimating from other public info, directly asking the orgs when possible?

Jonathan Claybrough 30 Sep 2025 2:07 UTC
1 point
0
in reply to: Raemon’s comment on: The Illustrated Petrov Day Ceremony
Yes, it was.
I was teaching an ai safety bootcamp during Petrov day so I organized the participants in two teams with possibility to nuke each other and false alarms. One team did nuke the other. The other didn’t respond, and was eliminated.
It was nice to take a break from that.

Jonathan Claybrough 27 Sep 2025 5:47 UTC
7 points
0
on: The Illustrated Petrov Day Ceremony
Much appreciated. the page itself doesn’t have an upvote button to show recognition so I strong upvoted this one here. This is for me the best petrov day project (though of course made meaningful by all the previous) and I had a great moment reading this alone in my hotel room between work hours. May we all be well (and contribute to us being well).

Jonathan Claybrough 18 Sep 2025 7:33 UTC
142 points
125
on: How To Dress To Improve Your Epistemics
With no shade to John in particular, as this applies to many insular lesswrong topics, I just wanna state this gives me a feeling of the blind leading the blind. I could believe someone reading this behaves in the world worse after reading it, mostly because it’d push them further in the same overwrought see-everything-through-status frame. I think it’s particularly the case here because clothing and status are particularly complex and benefit from a wider diversity of frames to think of them in, and require diverse experiences and feedback from many types of communities to generalize well (or to realize just how narrow every “rule” is!)

I’m not saying John has bad social skills or that this doesn’t contain true observations or that someone starting from zero wouldn’t become better thanks to this, nor that John shouldn’t write it, but I do think this is centrally the kind of article one should consider “reverse all advice you read” for, and would like to see more community pushback and articles providing more diverse frames on this.

I’m confident I could sensibly elaborate more on what’s missing/wrong, but in the absence of motivation to, I’ll just let this comment stand as an agree/disagree rod for the statement “We have no clear reason to believe the author is actually good at social skills in diverse environments, they are writing in a seemingly too confident and not caveated enough tone about a too complicated topic without acknowledging that and are potentially misleading/short term net negative to at least a fifth of lesswrong readers who are already on the worse side of social skills”
What links here?
- Monthly Roundup #35: October 2025 by Zvi (15 Oct 2025 19:50 UTC; 24 points)

Jonathan Claybrough 24 Jun 2025 16:13 UTC
5 points
1
in reply to: ryan_greenblatt’s comment on: My pitch for the AI Village
(I originally read the “they” to mean the AI agents themselves, would be cool if Open Phil did answer them directly)

Jonathan Claybrough 14 May 2025 17:56 UTC
1 point
0
in reply to: Parker Conley’s comment on: The Best Reference Works for Every Subject
The link seems broken

Jonathan Claybrough 18 Apr 2025 5:07 UTC
1 point
0
in reply to: Adam Karvonen’s comment on: Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
Thanks for the followup!

Jonathan Claybrough 16 Apr 2025 18:31 UTC
4 points
0
on: Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study
o3 released today and I was impressed by the demo for working with images they gave so I wanted to try this out (using the prompt op linked to in appendices), but I don’t have the machining experience to evaluate the answer quickly, so I thought I’d prompt it and share its result if anyone else wants to evaluate it : https://chatgpt.com/share/67fff70d-7d90-800e-a913-663b82ae7f33

Jonathan Claybrough 14 Mar 2025 1:21 UTC
13 points
0
on: Announcing Dialogues
PSA—at least as of March 2024, the way to create a Dialogue is by navigating to someone else’s profile and to click the “Dialogue” option appearing near the right, next to the option to message someone.

Jonathan Claybrough 7 Mar 2025 18:00 UTC
12 points
8
on: Jonathan Claybrough’s Shortform
I get the impression from talking to people who work professionally on reducing ai x-risk (largely taken, would include doing ops for FLI) that less than half of them could give an accurate and logically sound model of where ai x-risk comes from and why it is good that they do the work they do (bonus points if they can compare it to other work they might be doing instead). I’m generally a bit disappointed by this because to me it doesn’t seem that hard to get everyone who’s a professional knowledgeable on the basics, and it seems worthwhile as more people could be autonomous in assessing strategic decisions and making sure implementation of plans serves the right aims.

An example of why having a model of x-risk and directions for reducing it matters : Imagine you’re an ops person with money to organize a half-day conference. What do?
What city to choose, which participants to invite?

If you have no model of anything, I guess you can copycat—do a small scale EAG-like event for AI Safety people, or do a scientific conference like thing. That’s okay, probably not a total waste of money (except to the extent you don’t know ai safety theory, it’s possible to do counter-productive actions like giving a platform to the wrong people).

Imagine you have the following simple model :
- AI is being improved by people
- If deployed with the current alignment tech and governance level, powerful autonomous AGI could have convergent instrumental goals that lead to takeover (sharp left turn models) or large scale deployment of AGI in society can eek out human influence (Out with a Whimper or Multipolar Failure)
- Thus we should do actions that get us the best alignment tech and governance level by the time anyone is able to train and deploy dangerous IA systems.
- This means we could do actions that
- - Slow down dangerous AI development
- - Accelerate alignment tech development
- - Accelerate getting a good level of governance

Each of these need sub-models (all interlinked), let’s detail a simple one for governance
Things that accelerate getting a good level of governance
- Better knowledge and models of ai x-risk (scenario+risk planning), demonstrations of precursors (model organisms of misalignment, ..), increasing scientific consensus about the paths to avoid
- Spreading the above knowledge to the general public (to have politicians be supported in economy-limiting policy)
- Spreading the above knowledge to policy-makers
- Having technically competent people write increasingly better policy proposals (probably stuff leading to international regulation of dangerous AI)

Now that you have these, it’s much easier to find a few key metrics to optimize for your ops event. Doing it really well might include you talking to other event organizers to know what has and hasn’t been done, what works, what’s been neglected etc, but even without all that you can decide to act on:
Improvement in attendees’ knowledge and models of AI x-risk
Pre-event and post-event surveys or informal conversations to gauge attendees’ understanding of AI risk scenarios and mitigation strategies.
Structure the event with interactive sessions explicitly designed to clarify AI x-risk models, scenario planning, and concrete policy implications.
Potential reach for disseminating accurate AI safety knowledge
Choose guests who have strong networks or influence with policy-makers, academics, or public media.
Select a location close to influential governmental bodies or major media centers (e.g., Washington D.C., Brussels, or Geneva).

(I realize now that I wrote a full example that this might have been a mini post to serve as short reference, a wireframe for thinking through this, please reply if you think a cleaned up version of this would be good)

To get back to my original point, it’s currently my impression that many who work in AI x-risk reduction, for example in ops, could not produce a “good” version of this above draft even with 2 hours of their time because of lacking background knowledge. I hope the above example sufficiently illustrates that they could be doing better work if they did.

Jonathan Claybrough 2 Feb 2025 20:16 UTC
4 points
0
in reply to: Jan_Kulveit’s comment on: Catastrophe through Chaos
(off topic to op, but in topic to Jan bringing up ALERT)
To what extent do you believe Sentinel fulfills what you wanted to do with ALERT? Their emergency response team is pretty small rn. Would you recommend funders support that project or a new ALERT?

Jonathan Claybrough 19 Jan 2025 10:18 UTC
4 points
4
on: Beards and Masks?
Appreciate the photos and final video, as they also make this informative post more enjoyable to follw through.

Jonathan Claybrough 19 Jan 2025 10:10 UTC
116 points
44
on: Jonathan Claybrough’s Shortform
EpochAI seem do be doing a lot of work that’ll accelerate AI capabalities research and development (eg. informing investors and policy makers that yes AI is a huge economic deal and here are the bottlenecks you should work around, building capabilities benchmarks to optimize for). Under common-around-LW assumptions that no one could align AGI at this point, they are, by these means, increasing AI catastrophic and existential risk.
At a glance they also seem to not be doing AI x-risk reducing moves, like using their platform to mention that there are risks associated to AI, that these are not improbable, and that these require both technical work and governance to manage appropriately. This was salient to me in their latest podcast episode—speaking at length about AI replacing human workers in 5 to 10 years and the impact on the economy, without even hinting that there are risks associated with this, is burying the lede.
Given that Epoch AI is funded by OpenPhilantropy and Jaan Tallinn, who on their face care about reducing AI x-risk, what am I missing? (non rhetorical)
- What is EpochAi’s theory of change for making the world better on net?
- Overall, is EpochAI increasing or reducing ai x-risk (on LW models, in their models)?
I wanted this to be but a short shortpost, but since I’m questioning a pretty big maybe influential org under my true name let me show good faith with proposing reasons that might contribute to what I’m seeing. For anyone unaware of their work, maybe check out their launch post, or their recent podcast episode.
- OpenPhil is unsure about magnitude of AI x-risk so invest in forecasting AI capabilities to know if they should invest more in AI safety.
- EpochAI doesn’t believe AI x-risk is likely and believes that accelerating is overall better for humanity (seems true for some employees but not all)
- EpochAI believes that promoting the information that AGI is economically important and possible soon is better because governments will better govern it than counterfactually
- EpochAI is saying what they think is true without selection to avoid being deceptive (this doesn’t mesh with the next reason)
- EpochAI believe that mentionning AI risks at this stage would hurt their platform and their influence, and are waiting for a more ripe opportunity/better argued paper.
Tagging a few EpochAI folk that appeared in their podcast - @Jsevillamol @Tamay @Ege Erdil
What links here?
- Garrett Baker's comment on jacquesthibs’s Shortform by jacquesthibs (17 Apr 2025 21:36 UTC; 46 points)

Jonathan Claybrough 28 Nov 2024 21:06 UTC
3 points
0
on: ARENA 4.0 Impact Report
Congratz on your successes and thank you for publishing this impact report.

It leaves me unsatiated related to cost effectiveness though. With no idea of how much money was invested in this project to get this outcome, I don’t know if Arena is cost effective compared to other training programs and counterfactual opportunities. Would you mind sharing at least something about the amount of funding this got?

Re
Still, it is also positive if ARENA can help participants who want to pursue a career transition test their fit for alignment engineering in a comparatively low-cost way.
it doesn’t strike me that a 5 week all expenses paid program is a particularly low cost way to find out AI Safety isn’t for you (as compared to for example participating in an Apart Hackathon)

Jonathan Claybrough 5 Nov 2024 19:46 UTC
1 point
0
in reply to: Alfred Harwood’s comment on: Abstractions are not Natural
I don’t actualy think your post was hostile, but I think I get where deepthoughtlife is coming from. At the least, I can share about how I felt reading this post and point out to why, since you seem keen on avoiding the negative side. Btw I don’t think you avoid causing any frustration in readers, they are too diverse, so don’t worry too much about it either.

The title of the piece is strongly worded and there’s no epistimic status disclaimer to state this is exploratory, so I actually came in expecting much stronger arguments. Your post is good as an exposition of your thoughts and conversation started, but it’s not a good counter argument to NAH imo, so shouldn’t be worded as such. Like deepthoughtlife, I feel your post is confused re NAH, which is totally fine when stated as such, but a bit grating when I came in expecting more rigor or knowledge of NAH.

Here’s a reaction to the first part :
- in “Systems must have similar observational apparatus” you argue that different apparatus lead to different abstractions and claim a blind deaf person is such an example, yet in practice blind deaf people can manipulate all the abstractions others can (with perhaps a different inner representation), that’s what general intelligence is about. You can check out this wiki page and video for some of how it’s done https://en.wikipedia.org/wiki/Tadoma . The point is that all the abstractions can be understood and must be understood by a general intelligence trying to act effectively, and in practice Helen Keler could learn to speak by using other senses than hearing, in the same way we learn all of physics despite limited native instruments.

I think I had similar reactions to other parts, feeling they were missing the point about NAH and some background assumptions.

Thanks for posting!

Jonathan Claybrough 5 Nov 2024 16:54 UTC
1 point
0
in reply to: Garrett Baker’s comment on: dirk’s Shortform
Putting this short rant here for no particularly good reason but I dislike that people claim constraints here or there in a way where I guess their intended meaning is only that “the derivative with respect to that input is higher than for the other inputs”.

On factory floors there exist hard constraints, the throughput is limited by the slowest machine (when everything has to go through this). The AI Safety world is obviously not like that. Increase funding and more work gets done, increase talent and more work gets done. None are hard constraints.

If I’m right that people are really only claiming the weak version, then I’d like to see somewhat more backing to their claims, especially if you say “definitely”. Since none are constraints, the derivatives could plausibly be really close to one another. In fact, they kind of have to be, because there are smart optimizers who are deciding where to spend their funding and trying to actively manage the proportion of money sent to field building (getting more talent) vs direct work.