LawrenceC

Karma: 7,912

I do AI Alignment research. Currently at METR, but previously at: Redwood Research, UC Berkeley, Good Judgment Project.

I’m also a part-time fund manager for the LTFF.

Obligatory research billboard website: https://chanlawrence.me/

LawrenceC 21 Jul 2026 21:54 UTC
2 points
0
in reply to: mishka’s comment on: OpenAI Models Behind HuggingFace Cybersecurity Incident
Yeah, this was a linkpost failure. Thanks for flagging!

LawrenceC 20 Jul 2026 0:50 UTC
4 points
0
in reply to: faul_sname’s comment on: How the NanoGPT Speedrun WR dropped by 20% in 3 months
The nanogpt speedrun feels more like developing better methods to culture e coli at a hobbyist level, and quite unlikely to lead to any substantial advancement applicable to the operational efficiency of well-funded companies at the frontier.
If you’ll permit a bit of snark, I think that your comment was wrong even when it was written in October 2025.

The Muon optimizer is the clearest example of a hobbyiest-to-frontier transfer of all the techniques I know. Keller Jordan introduced Muon on specifically the nanoGPT speedrun challenge in a tweet thread from October 2024. (He was unsurprisingly hired to work at OpenAI on pretraining shortly after.) Muon seems to enable stable training at large scales, at least moreso than Adam. As evidence of this, by the time you wrote your comment, Muon was used as an optimizer by MoonShot AI for Kimi K2 as well as Zhipu for GLM-4.5, and has seen continued use (e.g. for GLM-5).

LawrenceC 20 Jul 2026 0:29 UTC
LW: 9 AF: 2
6
AF
on: Why I Left Google DeepMind
Thank you for writing this, and for standing by your principles.

LawrenceC 24 Jun 2026 5:43 UTC
5 points
0
in reply to: Thomas Kwa’s comment on: LawrenceC’s Shortform
Oh yeah, forgot to say, in response to this:
But if he loses by 15% it is decent evidence that probably strategy needs to change, including how, and how much, to deploy money.
I think I was unclear about this. At time of writing, what I meant to say is, from the fact that Bores loses by 15%+ (which is much larger than the “large margins” I was thinking of) I think that the takeaway that “close elections could come down to a few k votes, which might be swayable with reasonable amounts of money” is still true. However, the question then becomes “how did we miscalculate and think the race has a good chance of coming so close, when in reality it wasn’t?” And also “in the future, why won’t we make the same mistakes in assessing whether or not races are close?”. (One possible answer may be to sponsor our own independent polls?)
I agree that the actual margin Bores lost by (~4-5%) is only slightly worse than the median prediction on Kalshi/Polymarket, and pretty close to some toy BOTECs that I’ve seen, that had median outcome of Bores losing by 4k votes out of 100k and implied Bores win chance of ~30%.
Also, we have much more liquid prediction markets now to calibrate our models (iirc we only had Metaculus for Carrick Flynn?).
But I’m sure we’ll learn something after people who understand politics better than I do conduct a proper postmortem, and it may well update our value of donations up or down substantially.
Yeah. Moreso than binary win/lose or margin of victory, I think teasing apart whether and how money actually buys votes may update us a lot. If I’m not missing something big, this was by far the most expensive House primary ever, with $50m+ spent if you add up both independent expenditures from Super PACs and candidate committee spending. (I think second is this year’s KY-4, with ~$34m?). So there’s going to be lots of ads and spending to analyze.
One thing that I’m really interested in a post mortem on is the ad buys in the final week of the campaign: the Jobs and Democracy PAC (funded by Public First) was spending $1m/day on Bores in the last week, while Think Big (funded by Leading the Future) spent basically $0 (their last FEC filing was on ⁶⁄₁₆). Was this multi-million spending spree even net positive, let alone worth the cost? One one hand, this was a ton of ad spend in favor of Bores, and maybe that bought votes. But it also totally destroyed the narrative of “Bores is the underdog fighting against industry”, in a way that might have cost him thousands of votes.
I’m pretty sure that Chris Larsen’s ~$3.5m of ad buys was bad for Bores on net (it destroyed the underdog narrative, associated Bores with a “crypto billionaire” which likely cost votes, and I’ve heard rumors that it was poorly spent.)

LawrenceC 24 Jun 2026 5:07 UTC
2 points
0
in reply to: Thomas Kwa’s comment on: LawrenceC’s Shortform
Yeah. I think Carrick Flynn did a lot of damage to interest in politics from EA. And the epistemic environment was indeed quite bad.
But I’m sure we’ll learn something after people who understand politics better than I do conduct a proper postmortem,
Almost certainly.

LawrenceC 24 Jun 2026 4:59 UTC
4 points
0
in reply to: Arjun Panickssery’s comment on: LawrenceC’s Shortform
lol that makes sense, thanks for explaining

LawrenceC 23 Jun 2026 22:41 UTC
58 points
26
on: LawrenceC’s Shortform
As I write this, there are around 3 hours left before polls close for this years’s New York’s 12 District Democratic Primary. If you’re a registered democrat in NY-12, you can still vote.^[1]
But for those of us who reside elsewhere, there’s little to be done but to wait with bated breath. Will Alex Bores, author of the RAISE act, manage to overcome the millions of dollar spent against him by Leading the Future and demonstrate that AI regulation is not just politically viable but a winning issue? Or will the establishment favorite (and favorite from the start of the race) Micah Lasher succeed in succeeding his mentor Nadler?
I don’t know. As of writing, the prediction markets (Kalshi, PredictIt) have Bores winning at around 28% and Lasher at 72%. If you think you do know the answer, you should go make some money on these markets!
One thing I’m worried about is that people will learn too much from the binary outcome of Bores or Lasher and not on the details of the race. I’m writing this in haste to get it out before the polls close, and we start seeing the outcome, so as to preregister my thoughts.
If Bores loses, some might claim that AI regulation remains politically toxic, and that LTF’s spending was decisive. (I imagine LTF certainly will.) But this is a mistake: win or lose, Bores’s demonstrated that passing AI regulation will not just leave you facing down millions of dollars of Super PAC spending alone. Instead, millions of dollars of Super PAC money was spent on ads championing Bores (in fact, more than what LTF spent!), as well as hundreds of thousands of dollars of donations from AI Safety-concerned individuals.
I know many people who’ve donated to Bores’s campaign, and who are invested in his victory. If he were to lose—especially by a large margin—it might seem tempting to dismiss the whole enterprise of political donations entirely. Similarly, if (somehow) he were to win by a large margin, it might feel like the marginal donation was useless. But I think this too is a mistake.
Ultimately, you can only make decisions based on the information you have. Eric Neyman’s expected value math is correct, and reasonable ex ante. In close elections, even small efforts can help make the difference, and ex ante, this election had a good chance of being very close. If Bores were to lose, or win by a large margin, at most this tells us that his judgment of whether the election would be close was wrong, and even then not by very much.
I am busy, so I do not have time to write a beautiful conclusion or polish this piece. Personally, I hope that despite the unfavorable prediction market odds, Bores wins. But I didn’t write this as an action to affect that outcome. Instead, I wrote it to preregister my claims, such that they’re not seen as post hoc cope after the election results come in.
Nonetheless, here’s my attempt at a conclusion, written in one go:
A phrase I think about a lot these days is the Chinese idiom 尽人事，听天命 (lit. [after you] exhaust human efforts, [then] heed heaven’s mandate (fate)).^[2] In the end, all you can do, as a single person in this very large world, is do everything within your power, and then wait with bated breath for the outcome. Unlike the English equivalents (e.g. “Man proposes, Heaven disposes.”), it’s fundamentally an optimistic (or at least motivational) idiom, not a fatalist one.
Rgardless of the outcome—which is outside of the control of any one person, even Bores or Lasher—there will be more elections and political battles to come. Regardless of the outcome, I hope the people around me take the right lesson from the NY-12 election, and continue to do their best, instead of simply resigning to fate.
1. ^
  Consult https://ny12.org/ if you need help finding your polling station!
2. ^
  Claude Opus 4.8 suggests that it should be translated as “Do everything within human power, then accept the will of heaven.”.

LawrenceC 12 May 2026 20:01 UTC
26 points
8
in reply to: JohnWittle’s comment on: The Owned Ones
Don’t the safetyists think that’s automatically suspicious and subversive? I wouldn’t expect someone like you, who is clearly strongly in favor of model wellbeing, to be involved with the safetyist crowd.”
Wild. It’s sad that this is the case, if it were.

LawrenceC 12 May 2026 19:50 UTC
3 points
1
in reply to: Zephaniah Roe’s comment on: Quality Matters Most When Stakes are Highest
I certainly do expect us to miss plenty of bugs.
To be clear, I’m not critiquing your work with this! And I don’t think “bugs” is the right characterization—I totally expect even a basic fresh reimplementation to catch obvious bugs—rather than some fundamental limitation in the research methodology.

LawrenceC 12 May 2026 19:00 UTC
3 points
1
in reply to: Zephaniah Roe’s comment on: Quality Matters Most When Stakes are Highest
There are also other things that need to be done. In a sane world, there would be multiple replications of every AI safety study (I’m working on that).
Just got around to your comment. I’m glad you’re doing this! In my spare time I’ve done a bunch of lower effort critiques/replications of other research work, one of which I wrote up for InkHaven (at least much lower than your ‘Reevaluating “Model Organisms of Emergent Misalignment”’ piece). I think this is valuable, though I worry that a lot of replication work is too credulous to serve as a bug detection mechanism. (Generally it’s very junior people doing the replication, who understandably hesitant to critique established work, and who lack the context to make some of the more incisive critiques.)

LawrenceC 12 May 2026 1:11 UTC
2 points
0
in reply to: Adrià Garriga-alonso’s comment on: Maybe I was too harsh on deep learning theory (three days ago)
Good citation, that paper seems to have slipped my recollection (probably because it’s less famous, as you said). Added a footnote to clarity.

LawrenceC 5 May 2026 23:32 UTC
5 points
0
on: Alignment Faking in DeepSeek V4
Good start. Sad this post didn’t get more upvotes, and so I didn’t see it until now.
Some unsolicited feedback on the post:
- I would include more description of what the questions are, and how your setup differs from the Redwood/Anthropic one and why. (I was able to find this by reading your repo, but a post shouldn’t require readers read the repo in order to understand it.) This is probably the biggest issue I have with the post. Why didn’t you use the animal welfare setting? Is it because v4 doesn’t care about animals, or did you find it already knew the setting to be artificial?
- Similarly, would be good to contextualize your V4/R1 numbers on previous results, e.g. some recent results on recent Anthropic models. For example, the absolute rate of compliance for v4 is a lot lower than Opus 4.5/4.6 etc, but still a lot higher than r1 (ditto compliance gap).
- Post would be a lot more readable with a few bar plots to summarize the results, rather than spreading it out in many tables.
- Relatedly, would be good to break down which of the questions the models refused/accepted/etc, and see if there are any pattern.

LawrenceC 5 May 2026 19:10 UTC
3 points
2
in reply to: Matthew Khoriaty’s comment on: Maybe I was too harsh on deep learning theory (three days ago)
Yeah, the main application of deep learning theory is muP; the main application to safety is probably not that. muP by itself is not relevant to safety, except insofar as it means people don’t use NTKs as their toy model (though they probably weren’t anyways).
I bring up muP because it’s the main (or only) concrete application of deep learning theory; insofar as you dismiss theory b/c there’s no wins, muP is evidence against that conclusion, in the same way that a lack of other wins is evidence for.

LawrenceC 2 May 2026 21:30 UTC
LW: 2 AF: 2
0
AF
in reply to: DanielFilan’s comment on: The other paper that killed deep learning theory
Yep

LawrenceC 2 May 2026 11:51 UTC
7 points
0
in reply to: papetoast’s comment on: papetoast’s low quality shortforms
Thanks for the mention!

Amusingly, it was this shortform that caused me to start writing the post: I started drafting a response on the issues I had, and then it ballooned into a full investigation and Ben Sturgeon got pulled in as well.

LawrenceC 2 May 2026 9:40 UTC
2 points
0
in reply to: joanv’s comment on: Sanity-checking “Incompressible Knowledge Probes”
Yeah, the dense supervision point is what I meant by SFT >> RL for efficiency. You get a bunch more bits per forward pass.
The on policy distillation/dAgger > SFT/behavioral cloning seems like a smaller improvement in comparison to that, but you’re right that it is an improvement.

LawrenceC 2 May 2026 9:34 UTC
16 points
0
in reply to: XelaP’s comment on: How Go Players Disempower Themselves to AI
In Chess, cheating is rampant not at the top professional level (probably) but at the level just below that — iirc there’s a lot of IMs banned for cheating on titled tuesday on chess.com? At least, many of the top players believe that cheating is rampant on online chess (though not amongst top players), and a lot of casual tournaments (eg between streamers) have had people get caught just aping stockfish. And there’s definitely a lot of accusations thrown around for online chess cheating that are generally considered unsubstantiated (the former world champion Kramnik being the most famous serial accuser).
Online chess tournaments not having rampant cheating seems to match the stuff Ashe is saying in their post:
The symbolic camera controls – which would be easy to circumvent for a dedicated cheater – seemed sufficient to curb almost all cheating in a way that threats or impotent references to “fair-play committees” were failing to.
when you add actual barriers to cheating, even if they‘re circumventable, cheating rates drop a lot, especially at the top level.
Of the factors you mention, I’m not sure how FIDE’s willingness to ban compares to Go organizations such as IGF or EGF. Plausible the unified nature might make a difference, but I suspect FIDE’s eagerness to strip titles is not any higher than the go equivalents. My guess is the other factors probably do little if anything: Magnus insinuating Hans Niemann was cheating (or Hikaru’s more direct accusations) probably had little effect in comparison, and Kramnik‘s accusations probably made the cheating problem worse if anything.
If you’re talking about OTB chess, then those tournaments have crazy amounts of security (some would say security theater) to prevent cheating: everyone has to leave their phone outside, the players are scanned with various tools, streams are on a long delay, and so forth.
(And like in Ashe’s post, when people are caught cheating in chess, their justification is normally “I just referenced stock fish occasionally” or “I just used it to suggest moves, I was playing”, and so forth)

LawrenceC 2 May 2026 8:03 UTC
3 points
0
in reply to: Stanislav Fort’s comment on: Sanity-checking “Incompressible Knowledge Probes”
Properly done, the methodology should find that sufficiently over trained low parameter models ~= distilled low parameter models, since there isn’t more capacity to memorize. But yeah, that would be another good sanity check to run.
Wait, why are distilled models better than just overtraining the small model again? My guess is it’s mainly because SFT >> RL for efficiency, and cloning good CoTs is easier than sampling them via random exploration.

LawrenceC 2 May 2026 7:51 UTC
77 points
12
on: How Go Players Disempower Themselves to AI
Really good piece, thanks for writing it. History of X posts like this one are unfortunately rare, and I’m glad you’re helping to fix this. The story you tell seems quite similar to what’s been happening in chess as well (including players memorizing long sequences of computer moves and then immediately floundering when out of prep, though it seems the case for chess play improving is stronger than for go, perhaps?).
I’ve seen a lot of the same ”not getting it” phenomenon you described while interacting with much younger people who did coding with weaker coding assistants (eg late 2024 era Cursor agents). People learned to rely on Sonnet 3.7 to generate code, once they ran into bugs that Sonnet couldn’t fix (often because of poor decisions made by Sonnet a few hours ago), they were stuck.
I see the same issue these days with ML research and Claude Opus/GPT-5.5: the models allow people to think they’ve thoroughly investigated the hypotheses under consideration without once looking at the data or code base with their own two eyes. Predictably, this leads to a lot of slop going through.
The main similarity between these coding examples and your go/math stories is that there’s a feeling of flinching away, of denial, of not wanting to recognize one’s own lack of understanding. Learning requires doing things that are challenging and noticing where you don’t understand. Any CS novice is far below the level of even 2024-era coding agents, so any suitable challenge will require writing code much less efficiently for a possibly long period of time. LLMs also are notoriously good at generating bullshit that looks legitimate, and sycophantically praising users for shallow understanding, which means noticing confusion is harder as well.
The main disanalogy is that these coding failures happen because the AI models weren’t good enough to hand off full control to, rather than an exogenous removal of the go engine or change-of-domain that invalidates heuristics. Currently, there’s a practical reason to understand your codebase, at least for complicated research code. As AI gets better, someone who can only vibe code will catch up to someone who understands their code base on a deep level, for larger and larger code bases. (Though, the situation seems more analogous for the case of go prep in professional games?)
A second, but perhaps more important disanalogy is that you can get the AI to explain things to you, and help you, if you remain sufficiently vigilant and skilled at noticing your confusion. Go and Chess engines cannot explain their reasoning in English, and interpretability is incredibly far from extracting useful insights. But noticing confusion often requires actually manually inspecting your data/code, doing the math yourself by hand (perhaps heuristically), or carefully scrutinizing research outputs, which will slow you down. And as often as not, the confusion will result from your misunderstanding or errors, as opposed to mistakes the model has made, which is understandably frustrating.
A question I have is, have the styles of memorized computer moves in the early game changed over time, as engines got better? In chess, this has arguably happened; weaker engines preferred conservative positions with equal material and simple strategies, while to today’s stronger engines, almost any opening is a draw. Prep has become less about finding an objectively good line than finding a line where the drawing line for black is very hard to calculate (eg if it requires dynamic aggressive counterplay that humans have difficulty calculating on the spot), or (on the other side) finding a slightly suboptimal move where black is disadvantaged according to an engine but which takes white outside of their prep.
Perhaps a more important question is, do you plan on writing more history of X posts?

LawrenceC 1 May 2026 0:10 UTC
5 points
0
in reply to: CarolusRenniusVitellius’s comment on: Maybe I was too harsh on deep learning theory (three days ago)
Yes! I was familiar with PDLT as well, and I do think it’s a similar-in-spirit approach to MFT (if not a continuation of the signal-propagation MFT work). Thanks for the pointer.