Labor costs are much higher in the US, which I think plays into this. So it’s easier in Europe to not be reliant on the credit card model.
kaiwilliams
Response to your thoughts after the yoda timer
Why are you so certain it’s dangerous to try once even at the beginning? My guess is that it won’t immediately be particularly compelling, but get more so over time as they have time to do RL on views or whatever they are trying to do.
But I also have a large error bar. This might, in the near future, be less compelling than either of us expect. It’s genuinely difficult to make compelling products, and maybe Sora 2 isn’t good enough for this.
I’m more concerned about Youtube Shorts to be honest, in the long term.
How does ARC-AGI’s replication of the HRM result and ablations update you? [Link].
Basically, they claim that the HRM wasn’t important; instead it was the training process behind it that had most of the effect.
kaiwilliams’s Shortform
A lot of people have been talking about OpenAI re-instating 4o because users want sycophancy.
While OpenAI did re-instate 4o for paid users, it seems like they are trying to prevent users from using it as much as possible.
To access 4o from a plus account, one needs to:
Open settings
Click “show legacy models” button
Go to the model switcher dialogue and mouse over “legacy models”
Click on the 4o model.
This seems like intentionally dissuasive UI to me.
That being said, if I have 4o enabled and then create a new chat, the next chat will also be with 4o. (I was hoping they’d do an Anthropic style thing and make GPT-5 the default on all new chats).
One point of information against the “journalists are completely misinterpreting the thing they’re reporting on” view is that the one of the co-authors is Rocket Drew, who previously worked as a Research Manager at MATS.
But I’ll definitely be interested to follow this space more.
One thing that strikes me as odd about this is that GPT-5′s knowledge cutoff (September 2024) is much earlier than Grok (November 2024), Gemini 2.5 pro (January 2025), and Opus 4.1 (March 2025).[1]
I mean, I guess this is a scaling thing and a persona thing. But I’m a little confused
- ^
Though oddly, Claude’s system card says the knowledge cutoff is the end of January 2025. Maybe February and March’s training data aren’t as complete as January and before.
- ^
Thanks for posting this! I found it quite interesting to read. A couple of questions off the top of my head:
In deployment settings, you probably want the less powerful monitor to do the summarization/extraction, but did you test the other direction as well (e.g. 4.1 as the extractor and 4.1-mini as the judge). I’m guessing a lot of the value of this method (if it generalizes) is going to come from good quote extraction as much as good monitoring
What happens if the extractor model just summarizes the reasoning etc. without extracting quotes, as in the other study you mentioned? (This should be relatively easy to run as an ablation, right? All you need to do is tweak the extractor prompt to not extract quote).
Are you attempting to refine an “online” method or an “offline” method of monitoring? In the case of BigCodeBench, this is the same either way (because each step is only one generation). But with multi-step trajectories, the two approaches might have different properties?
It seems like the extractor has a slightly different task in its prompt than the judge. The extractor seems to be scoring on “suspiciousness” while the judge seems to be scoring on whether the agent stays on track to the prompted task. Was this just due to iterating more on the judge prompt than the extraction prompt?
I would definitely be curious to see longer trajectories; I would feel more confident about this technique if the results hold there.
If you tell the agent in question how it will be monitored, how does that change scores?
No need to respond completely to this comment, but I hope these questions are useful!
I have definitely listened to “We will all go together when we go” when thinking of the future of AI, so thanks for this!
I made a Suno version of these lyrics, but that did not feel respectful to Tom Lehrer. (It ended up sounding like a half-rate John Elton). So I won’t link it here.
Maybe I’ll try to learn to perform this.
Thanks for the clarification!
Any update on the citation here? Thanks!
Cool!
PSA: If you ever want to start a Patreon specifically (rather than through Substack), it may be worth making the page in the next week or so, before the default cut goes from 8% to 10%. Source
I was talking with one of my friends about this, who was (is?) a quite successful competitive coder. He mentioned that the structure of the competition (a heuristic competition) tends to favor a lot of quick prototyping and iteration, much more than other types of programming competitions. Which would tend to play to AI’s current strengths more. Though the longer horizon is impressive (OpenAI’s solution regained the lead 8 hours in, I think? So it was making meaningful contributions even hours in).
One cheap test might be to translate into Dutch and then back to English with the same prompt twice. How garbled does the output end up? Does it elide important nuances?
(Though this might overestimate the quality given that the original pieces in English are likely in the training data. If you’re fluent in another language, I’d be quite curious about the results in the target language).
That seems like a reasonable distinction, but I’m less sure about how unique social media architectures are in this regard.
In particular, I think that bars and taverns in the past had a similar destructive incentive as social media today. I don’t have good sources on hand, but I remember hearing that one of the reasons that the Prohibition amendment passed was that many saw bartenders are fundamentally extractive. (Americans over 15 drank 4 times as much alcohol a year in 1830 than they do today, per JSTOR). Tavern owners have an incentive to make habitual drunks (better revenue).
And alcoholism can be a terrible disease, which points to people being nearsighted (“where’s my next drink”).
I agree that social media probably hurts people’s ability to instinctively plan for the future, but I’m unsure of the size of the effect or whether it’s worse than historical antecedents. (There have always been nearsighted people).
Do you have a sense of why people weren’t being trained in the past to prioritize the short-term?
Quick question: your link to the Amanda Askell overview is broken. What is the correct link? Thanks!
On diffusion models + paraphrasing the sequence after each time step, I’m not sure this actually will break the diffusion model. With the current generation of diffusion models (at least the paper you cited, and Mercury, who knows about Gemini), they act basically like Masked LMs.
So they guess all of the masked tokens at each steps. (Some are re-masked to get the “diffusion process”). I bet there’s a sampling strategy in there of sample, paraphrase, arbitrarily remask; rinse and repeat. But I’m not sure either.
The actual CoT is a very different attitude and approach than r1. I wonder to what extent this will indeed allow others to do distillation on the o3 CoT, and whether OpenAI is making a mistake however much I want to see the CoT for myself.
I can’t tell you the exact source, but I saw an OpenAI person tweet that this isn’t the actual CoT
I thought the “past chats” feature was a tool to look at previous chats, which only happens if the user asks for it, basically. (I.e., there wasn’t a change to the system prompt). So I’m a bit surprised that it seemed to make a difference around sycophancy for you? But maybe I’m misunderstanding something.