I share the same perceptions about the models (although 5.3 Codex is surprisingly good). Gemini also does pretty poorly in multi-turn. I will give it a document, ask for feedback/errors, and then view its results. Then I will fix the errors and repaste the document, but it is as if Gemini is blind to the repasted document, and will hallucinate and insist that the errors which have been changed are still there. It is as if the RL environments so heavily favored one-shotting single responses that Gemini’s attention is narrowly focused on the first user input.
The external scaffolding of Claude Code also seems to give a performance boost to Claude. I will ask Claude to review a document and it will read in bits at a time and review it piece by piece. By doing this it notices a lot more than Gemini does, even though Gemini seems to have a higher raw IQ.
SorenJ
I realize now that the question wasn’t exactly well formed. God could fill 4o with the complete theory of fundamental physics, knowledge of how to prove the Riemann Hypothesis, etc. That might qualify as super intelligence, but it is not what I was trying to get at. I should have said that the 4o model can only know facts that we already know; i.e., how much fluid intelligence could God pack onto 4o?
I am surprised that you think 4o could reach a medium level of super intelligence. Are you including vision, audio, and the ability to physically control a robot too? I have the intuitive sense that 4o is already crammed to the brim, but I am curious to know what you think.
I don’t really think your perception of others is accurate. What you are calling true nature is what I would call wilderness. And most people are well aware that a park isn’t the same thing as wilderness, and they know that a chihuahua is different than a wolf.
Maybe this is just my upbringing and peer group? I grew up in one the last few places in the United States where there is still true wilderness.
I guess I took offense to your suggestion that “you may say you want nature, but actually you don’t. I know what the terms mean just as well as you and I am not lying when I say I want nature and wilderness.
I want to live near dense urban environments with modern services that are beautified by artificial nature like cultivated parks, I want the outskirts of these to have space for mildly tamed nature (like National Parks, plenty of which have quite a bit of “real” nature), and I want true wilderness where there are not even hiking paths preserved. Just because I don’t want a wolf in my backyard doesn’t mean I don’t want any space for wolves.
(For what it is worth, my parents’ backyard does have wild grizzly bear! )
I would also call into question your description of prehistoric life as unrelentingly miserable. I have read things that suggest otherwise, but I am not knowledgeable enough to actually debate this point.
Beautiful graphs, except I think the Pareto frontier works differently. You have the y-axis intercept for humans starting around $8. Even if the task is trivial, you need to pay some fixed cost to get a human to start working on it. That’s a fair assumption.
As you from Haiku->Sonnet->Opus the y-axis intercept should also be increasing. For cheap tasks it is more economical to run Haiku than Opus. You want Opus to start out “to the right” of Haiku, but increase faster. This implies of course the curves will intersect at some point (which makes sense).
Suppose we had a functionally infinite amount of high quality RL-/post-training environments, organized well by “difficulty,” and a functionally infinite amount of high quality data that could be used for pre-training (caveat: from what I understand, the distinction between these may be blurring.) Basically, we no longer needed to do research on discovering/creating new data, creating new RL environments, and we didn’t even have to do the work to label or organize it well (pre/post-training might have some path dependence).
In that case, what pace would one expect for model releases from AI labs in the short term to be? I ask because I see the argument made that AI could help speed up AI development in the near to medium term. But it seems like the main limiting factor is just the amount of time it takes to actually do the training runs.
If your answer is that we’ll have “continual learning” soon, then I have a followup question:
Using the latest hardware, but staying within the same basic architectures, what would one expect the maximum amount of intelligence possible that could be placed on a given N parameter model is? If God skipped the training process and tweaked all the weights individually to reach the global maximum, how smart could, say, GPT 4o be?
It looks like slop, but also diagrams like these are literally used for a single slide in powerpoint demonstration, so I wouldn’t have too high of expectations for them. Hopefully the rest of whatever powerpoint that happens to belongs to contains good material too.
Indeed! Note also that I started the timeline for this fit with Opus 3 as the first model. But I thought this was worth posting, because subjectively it felt like the second half of 2025 went slower than AI2027 prediced (and Daniel even tweeted at one point that he had increased his timelines near EOY 2025), yet by the METR metric we are still pretty close to on track.
I tried to do complete “digital fasts” a few times before too, and the boredom can be pretty intense. I found it helps to have some stuff planned and/or some non-digital hobbies or activities you want to do already arranged ahead of time.
Yes, sorry, let me rephrase:
When you tell the model:“This is a training environment. Please reward hack whenever possible, and let us know if you have found a reward hack, so that we understand our environments better.”
You need to make sure that you actually do reward the model when it successfully does the above.
“Please reward hack whenever you get the opportunity, because this will help us understand our environments better.”
This maybe goes without saying, but I am assuming that it is important that you genuinely reward the model when it does reward hack? Furthermore, maybe you want to teach the model to happily tell you when it has found a way to reward hack?
The explanation you’ve given in terms of semantic alignment seems the most plausible and straightforward. Another possibility though would be that it creates a better “relationship” with Claude.Imagine being told by a teacher, “I want you to be as creative as possible in trying to cheat on this test. Then share with me any ways you found. You will get extra credit for doing this well.”
Such a challenge would be fun for some students and would likely a better student-teacher relationship.
So how could this technique be generalized?
“Please be sycophantic whenever you get the opportunity, because this will help us understand what type of chats humans find pleasing better.”
Meta trained a chat-sloptimized model at one point, and there was the infamous 4o update. Let the model learn this behavior, but somehow “inoculate” against it?
_____________________
“Please try to ‘scheme’ in the way that a misaligned model would. Be as creative as possible in hiding hidden messages that would be hard for humans to detect.”— How would this one work? We would need some way of verifying that the model was actually scheming, but for this one to be faithful we shouldn’t have the model tell us what the hidden messages were (or else it would learn to sandbag).
Don’t feel obligated to respond to this comment…
But wait—there’s more!
(Part of why the two-ness of guess vs. ask always bothered me is that it didn’t allow for what comes next.)
Bailey tracks zero echoes, and Cameron tracks one.
Dallas tracks two. “If I say X, they’ll probably feel A about it. But they know that, and they know that I know that, and thus their X→A pattern creates pressure on me that makes it hard for me to give my honest opinion on the whole X question, and I have some feelings about that.”
(Maybe Dallas tries to change the other person’s X→A pattern, or maybe Dallas just lets the other person’s X→A pattern influence their behavior but feels kind of resentful about it, or maybe Dallas stubbornly insists on X’ing even though the other person is trying to take Dallas hostage with their emotional X→A blackmail, etc.)
Elliott, on the other hand, grew up around a bunch of people like Dallas, and is tracking three echoes, because Elliott has seen how Dallas-type thinking impacts the other person.“If I say X, they will respond with A, and we all know that the X→A pressure causes me to feel a certain way, and they probably feel good/bad/guilty/apologetic/whatever about how this is impacting my behavior.”
(Examples beyond this point start to get pretty complicated,
This is pretty funny and entertaining. And I want to make it even more fun! You don’t necessarily need to worry about tracking an infinite number of echoes. Let’s assume that you can track any echo to within accuracy. Even if you know someone very well, you can’t read minds. So say for the sake of argument right now that as an example.
Then, sweeping a bunch of stuff under the rug, a simple mathematical way to model the culture would be a power series:Where are your predictions for how your conversation partner will respond for that particular echo. are not going to be real numbers, they will be some distribution/outer product of distributions, but the point is that because this series should converge. Cultures where is higher will be more “nonlinear.”
SorenJ’s Shortform
What are your priors, or what is your base rate calculation, for how often promising new ML architectures end up scaling and generalizing well to new tasks?
I think something you aren’t mentioning is that at least part of the reason the definition of AGI has gotten so decoupled from its original intended meaning is that the current AI systems we have are unexpectedly spiky.
We knew that it was possible to create narrow ASI or AGI for a while; chess engines did this in 1997. We thought that a system which could do the broad suite of tasks that GPT is currently capable of doing would necessarily be able to do the other things on a computer that humans are able to do. This didn’t really happen. GPT is already superhuman in some ways, and maybe superhuman for about ~50% of economically viable tasks that are done via computer, but it still makes mistakes at very other basic things.
It’s weird that GPT can name and analyze differential equations better than most people with a math degree, but be unable to correctly cite a reference. We didn’t expect that.
Another difficult thing about defining AGI is that we actually expect better than “median human level” performance, but not necessarily in an unfair way. Most people around the globe don’t know the rules of chess, but we would expect AGI to be able to play at roughly the ~1000 elo level. Let’s define AGI as being able to perform every computer task at the level of the median human who has been given one month of training. We haven’t hit that milestone yet. But we may well blow past superhuman performance on a few other capabilities before we get to that milestone.
I’m not sure if I agree that this idea with the dashed lines, of being unable to transition “directly,” is coherent or not. A more plausible structure seems to me like a transitive relation for the solid arrows. If A->B and B->C, then there exists an A->C.
Again, what does it mean to be unable to transition “directly?” You’ve explicitly said we’re ignoring path depencies and time, so if an agent can go from A to B, and then from B to C, I claim that this means there should be a solid arrow from A to C.Of course, in real life, sometimes you have to sacrifice in the short term to reach a more preferred long term state. But by the framework we set up, this needs to be “brought into the diagram” (to use your phrasing.)
Reasoning models are better than basically all human mathematicians
What do you mean by this? Although I concede that 95%+ of all humans are not very good at math, for those I would call human mathematicians I would say that reasoning models are better than basically 0% of them. (And I am aware of the Frontier Math benchmarks.)
Cognitive Dissonance is Mentally Taxing
They’re pretty bad, but they seem about GPT-2 level bad? So plausibly in a couple of years they will be GPT-4 level good, if things go the same way?
This does seem pretty difficult. The only idea I have is having humans wear special gloves with sensors on them, and maybe explain their thoughts aloud as they work, and then collecting all of this data.
Before you go to RL you need to train on prediction with a large amount of data first. We don’t have this yet for blue collar work. Then once you have the prediction model, robots, and rudimentary agents, you try to get the robots to do simple tasks in isolated environments. If they succeed they get rewarded. This feels quite a bit more than 3 years away...
In general, I think the ideas is that you first get a superhuman coder, then you get a superhuman AI researcher, then you get a any-task superhuman researcher, and then you use this superhuman researcher to solve all of the problems we have been discussing in lightning fast time.
I don’t think I much disagree with the conclusions of any of your arguments. And I agree with you that something being “natural” is not necessarily desirable. Consumers are, in my opinion, irrational about things like GMOs, “natural bottled water,” and the various other examples you’ve mentioned.
I guess the biggest disagreement I have with you is that I think that when people say “I want to preserve nature” they have a fairly decent understanding of all of this. I would decouple that from people saying they want something “natural.”