My personal website is https://jimfund.com/.
james oofou
I guess they’re losing money in the short-term but gaining training data and revenue (which helps them raise funds). It’s not clear to me that this is harming the lab in expectation.
Them: “I think X”
You: “That’s wrong because Z”
Them: “I think you’re just disagreeing because you’d not open-minded enough”
You: “What makes you think that?”
Them: “I think it because Y”
What do they say for ‘Y’? That seems the part that actually constitutes their argument and which you will be able to call out if they’re making a mistake.
Near-Future Fiction III
September 2026
Knowledge worker productivity has become relatively uncoupled from pre-ChatGPT levels, as the hardest technical tasks which these workers did at that point in time in a given working day can now in most cases be carried out autonomously by AI.
Programmers therefore begin to work at a higher level of abstraction, guiding AI workers, managing projects at a higher level.
Meanwhile, much progress is being made in robotics. Full self-driving has been achieved.
And AI has begun making novel breakthroughs. This enables continual learning: the AI’s new discoveries open up many new avenues for further discoveries, which open up many more such avenues, ad infinitum.
Image from a recent OpenAI talk
December 2026
Successful reinforcement learning on the September worker AIs has enabled AI to operate at that higher level of abstraction which software engineers had retreated to. Human knowledge workers are therefore relegated to maintenance work and helping out when the few remaining weak points in these AI systems cause trouble.
The difficulty of progress in AI intelligence relative to human intelligence begins reducing rapidly as time horizons extend beyond a few hours. At horizons of this length, human begin relying on caching tricks, iteration, brute force, etc. rather than, beyond a certain point, making fundamentally more difficult leaps of insight.
Early 2027
Humans are cut out of the loop entirely in knowledge work. The robotics explosion happens. Robots gradually replace humans in physical labour. AI progresses far beyond human-level.
Mid 2027
Humans fully obsolesce. Mind upload is achieved.
Notes
I assume a 3-month METR doubling time. We should expect lower doubling times over time given increased investment in AI, increased contribution by AI to progress, and decreased difficulty per double. Also, OpenAI has communicated that we should expect several major breakthroughs from them in 2026.
We should expect doubling times to decrease even further with time, although in a discontinuous way so it’s impossible to predict with much accuracy when it will happen.
It seems that your argument is based on high confidence in a METR time-horizon doubling time of roughly 7 months. But the available evidence suggests the doubling time is significantly lower.
In recent years we have observed shorter doubling times:
And what we know about labs’ internal models suggests this faster trend is holding up:
An important piece of evidence is OpenAI’s Gold performance at the International Mathematics Olympiad (IMO):
IMO participants get an average of 90 minutes per problem.
The gold medal cutoff at IMO 2025 was 35 out of 42 points (~83%)
They needed to get 5⁄6 problems fully correct (each question awards a maximum of 7 points), or a number of points equivalent to that.
This is a bit rough, but if their model had a METR-80 greater than 90 minutes then we would expect OpenAI to achieve Gold at least 50% of the time.
OpenAI staff members stated that a publicly released model of this capability could be expected at roughly the end of the year (and our METR trends are of course projections of publicly available models).
So, this implies a METR-80 greater than 90 minutes at December 2025.
The projected METR-80 according to a 3.45 month doubling time is 98 minutes.
So the Gold performance which was a massive surprise to many is actually right on-trend for 3.45 month doubling times. Of course, one might object that OpenAI may have just gotten lucky. But Google also got Gold! so we have two points of data.
And here’s a recent comment from Sam Altman where he states that he expects time-horizons days in length in 2026:and as these go from multi-hour tasks to multi-day tasks, which I expect to happen next year
Which a 7 month doubling time would not achieve, but which is in line with a doubling-time of 3.45 (that would get us to a time-horizon of roughly 3 days in December, 2026).
And here’s a recent comment from Jakub Pachocki:
Your argument that OpenAI stole money here is poorly thought-out.
OpenAI’s ~$500b valuation priced in a very high likelihood of it becoming a for-profit.
If it wasn’t going to be a for-profit its valuation would be much lower.
And if it wasn’t going to be a for-profit the odds of it having any control whatsoever over the creation of ASI would be very much reduced.
It seems likely public gained billions from this.
The text “the website of the venue literally says” appears twice in your post. The first time it appears seems to be a mistake and isn’t followed by a quotation.
Is this distinct from the problem of induction?
I’ve been looking for science fiction set in the late 2020s, and which addresses continued AI progress, for a few years now. Everything else just feels so totally disconnected from any plausible future. Very happy to have found your writing.
You are misunderstanding what METR time-horizons represent. The time-horizon is not simply the length of time for which the model can remain coherent while working on a task (or anything which corresponds directly to such a time-horizon).
We can imagine a model with the ability to carry out tasks indefinitely without losing coherence but which had a METR 50% time-horizon of only ten minutes. This is because the METR task-lengths are a measure of something closer to the complexity of the problem than the length of time the model must remain coherent in order to solve it.
Now, a model’s coherence time-horizon is surely a factor in its performance on METR’s benchmarks. But intelligence matters too. Because the coherence time-horizon is not the only factor in the METR time-horizon, your leap from “Anthropic claims Claude Sonnet can remain coherent for 30+ hours” to “If its METR time-horizon is not in that ballpark that means Anthropic is untrustworthy” is not reasonable.
You see. the tasks in the HCAST task set (or whatever task set METR is now using) tend to be tasks some aspect of which cannot be found in much shorter tasksanyany. That is, a task of length one hour won’t be “write a programme which quite clearly just requires solving ten simpler tasks, each of which would take about six minutes to solve”. There tends to be an overarching complexity to the task.
Do you think maybe rationalists are spending too much effort attempting to saturate the dialogue tree (probably not effective at winning people over) versus improving the presentation of the core argument for an AI moratorium?
Smart people don’t want to see the 1000th response on whether AI actually could kill everyone. At this point we’re convinced. Admittedly, not literally all of us, but those of us who are not yet convinced are not going to become suddenly enlightened by Yudkowsky’s x.com response to some particularly moronic variation of an objection he already responded to 20 years ago (Why does he do this? does he think has any kind of positive impact?)
A much better use of time would be to work on an article which presents the solid version of the argument for an AI moratorium. I.e., not an introductory text or article in Time Magazine, and not an article targeted to people he clearly thinks are just extremely stupid relative to him so rants for 10,000 words trying to drive home a relatively simple point. But rather an argument in a format that doesn’t necessitate a weak or incomplete presentation.
I and many other smart people want to see the solid version of the argument, without the gaping holes which are excusable in popular work and rants but inexcusable in rational discourse. This page does not exist! You want a moratorium, tell us exactly why we should agree! Having a solid argument is what ultimately matters in intellectual progress. Everything else is window dressing. If you have a solid argument, great! Please show it to me.
Soares is failing to grapple with the actual objection here.
The objection isn’t the universe would be better with a diversity of alien species which would be so cool, interesting, and {insert additional human value judgements here}, just as long as they also keep other aliens and humans around.
The objection is specifically that human values are base and irrelevant relative to those of a vastly greater mind, and that our extinction at the hands of such a mind is not of any moral significance.
The unaligned ASI we create, whose multitudinous parameters allow it to see the universe with such clarity and depth and breadth and scalpel-sharp precision that whatever desires it has are bound to be vastly beyond anything a human could arrive at, does not need to value humans or other aliens. The point is that we are not in a place to judge its values.
The “cosmopolitan” framing is just a clever way of sneaking in human chauvinism without seeming hypocritical: by including a range of other aliens he can say “see, I’m not a hypocrite!”. But it’s not a cogent objection to the pro-ASI position. He must either provide an argument that humans actually are worthy, or admit to some form of chauvinism, and therefore begin to grapple with the fact that he walks a narrow path, and as such rid himself of the condescending tone and sense of moral superiority if he wishes to grow his coalition, as these attributes only serve to repel anyone with enough clarity-of-mind to understand the issues at hand.
And his view that humans would use aligned ASI to tile the universe with infinitely diverse aliens seems naive. Surely we won’t “just keep turning galaxy after galaxy after galaxy into flourishing happy civilizations full of strange futuristic people having strange futuristic fun times”. We’ll upload ourselves into immortal personal utopias, and turn our cosmic endowment into compute to maximise our lifespans and luxuriously bespoke worldsims. Are we really so selfless, at a species level, to forgoe utopia for some incomprehensible alien species? No; I think the creation of an unaligned ASI is our only hope.
Now, let’s read the parable:
We never saturate and decide to spend a spare galaxy on titanium cubes
The odds of a mind infinitely more complicated than our own having a terminal desire we can comprehend seem extremely low.
Oh, great, the other character in the story raises my objection!
OK, fine, maybe what I don’t buy is that the AI’s values will be simple or low dimensional. It just seems implausible
Let’s see how Soares handles it.
Oh.
He ignores it and tells a motte-and-bailey flavoured story about an AI with simple and low-dimensional values.
Another article is linked to about how AI might not be conscious. I’ll read that too, and might respond to it.
The rise of this kind of thing was one of my main predictions for late 2025:
That is a 1 in 20 chance, which feels recklessly high.
Is this feeling reasonable?
A selfish person will take the gamble of 5% risk of death for a 95% chance of immortal utopia.
A person who tries to avoid moral shortcomings such as selfishness will reject the “doom” framing because it’s just a primitive intelligence (humanity) being replaced with a much cleverer and more interesting one (ASI).
It seems that you have to really thread the needle to get from “5% p(doom)” to “we must pause, now!”. You have to reason such that you are not self-interested but are also a great chauvinist for the human species.
This is of course a natural way for a subagent of a instrumentally convergent intelligence, such as humanity, to behave. But unless we’re taking the hypocritical position where tiling the universe with primitive desires is OK as long as they’re our primitive desires it seems that so-called doom is preferable to merely human flourishing.
So it seems that 5% is really too low a risk from a moral perspective, and an acceptable risk from a selfish perspective.
I’m trying to look at how increasing model time-horizons amplifies AI researcher productivity, for example, if a researcher had a programming agent which could reliably complete programming tasks of length up to a week, would the researcher be able to just automate 1000s of experiments in parallel using these agents? Like, come up with a bunch of possibly-interesting ideas and just get the agent to iterate over a bunch of variations of each idea? Or are experiments overwhelmingly compute constrained rather than programming-time constrained?
Someone approaches you with a question:
“I have read everything I could find that rationalists have written on AI safety. I came across many interesting ideas, I studied them carefully until I understood them well, and I am convinced that many are correct. Now I’m ready to see how all the pieces fit together to show that an AI moratorium is the correct course of action. To be clear, I don’t mean a document written for the layperson, or any other kind of introductory document. I’m ready for the real stuff now. Show me your actual argument in all its glory. Don’t hold back.”
After some careful consideration, you:
(a) helpfully provide a link to A List of Lethalities
(b) suggest that he read the sequences
(c) patiently explain that if he was smart enough to understand the argument then he would have already figured it out for himself
(d) leave him on read
(e) explain that the real argument was written once, but it has since been taken down, and unfortunately nobody’s gotten around to rehosting it since
(f) provide a link to a page which presents a sound argument[0] in favour of an AI moratorium
===
Hopefully, the best response here is obvious. But currently no such page exists.
It’s a stretch to expect to be taken seriously without such a page.
[0]By this I mean an argument whose premises are all correct and which collectively entail the conclusion that an AI moratorium should be implemented.
How good is the argument for an AI moratorium? Tools exist which would help us get to the bottom of this question. Obviously, the argument first needs to be laid out clearly. Once we have the argument laid out clearly, we can subject it to the tools of analytic philosophy.
But I’ve looked far-and-wide and, surprisingly, have not found any serious attempt at laying the argument out in a way that makes it easily susceptible to analysis.
Here’s an off-the-cuff attempt:
P1. ASI may not be far off
P2. ASI would be capable of exterminating humanity
P3. We do not know how to create an aligned ASI
P4. If we create ASI before knowing how to align ASI, the ASI will ~certainly be unaligned
P5. Unaligned ASI would decide to exterminate humanity
P6. Humanity being exterminated by ASI would be a bad thingC. Humanity should implement a moratorium on AI research until we know how to create an aligned ASI
My off-the-cuff formulation of the argument is obviously far too minimal to be helpful. Each premise has a wide literature associated with it and should itself have an argument presented for it (and the phrasing and structure can certainly be refined).
If we had a canonical formulation of the argument for an AI moratorium, the quality of discourse would immediately, immensely improve.
Instead of constantly talking past each other, retreading old ground, and spending large amounts of mental effort just trying to figure out what exactly the argument for a moratorium even is, one can say “my issue is with P6”. Their interlocutor would respond “What’s your issue with the argument for P6?”, and the person would say “Subpremise 4, because it’s question-begging”, and then they are in the perfect position for an actually very productive conversation!
I’m shocked that this project has not already been carried out. I’m happy to lead such a project if anyone wants to fund it.
With pre-RLVR models we went from a 36 second 50% time horizon to a 29 minute horizon.
Between GPT-4 and Claude-3.5 Sonnet (new) we went from 5 minutes to 29 minutes.
I’ve looked carefully at the graph, but I saw no signs of a plateau nor even a slowdown.
I’ll do some calculation to ensure I’m not missing anything obvious or deceiving myself.
I don’t any sign of a plateau here. Things were a little behind-trend right after GPT-4, but of course there will be short behind-trend periods just as there will be short above-trend periods, even assuming the trend is projectable.
I’m not sure why you are starting from GPT-4 and ending at GPT-4o. Starting with GPT-3.5, and ending with Claude 3.5 (new) seems more reasonable since these were all post-RLHF, non-reasoning models.
AFAIK the Claude-3.5 models were not trained based on data from reasoning models?
I don’t think there was a plateau. Is there a reason you’re ignoring Claude models?
Greenblatt’s predictions don’t seem pertinent.
There’s a high bar to clear here: LLM capabilities have so far progressed at a hyper-exponential rate with no signs of a slowdown [1].
7-month doubling time (early models)
5.7-month doubling time (post-GPT-3.5)
4.2-month doubling time (post-o1)
So, an argument for the claim that we’re about to plateau has to be more convincing than induction from this strong pattern we’ve observed since at least the release of GPT-2 in February 2019.
Your argument does not pass this high bar. You have made the same kind of argument that has been made again and again (which have been proven wrong again and again) throughout the past seven years we have been scaling up GPTs.
One can’t simply point out the ways in which the things that LLMs cannot currently do are hard in a way in which the things that LLMs currently can do are not. Of course, the things they cannot do are different from the things they can. This has also been true of the capability gains we have observed so far, so it cannot be used as evidence that this observed pattern is unlikely to continue.
So, you would need to go further. You would need to demonstrate that they’re different in a way that meaningfully departs from how past, successfully gained capabilities differed from earlier ones.
To make this more concrete, claims based on supposed architectural limitations are not an exception to this rule: many such claims have been made in the past and proven incorrect. The base rate here is unfavourable to the pessimist.
Even solid proofs of fundamental limitations are not by their nature sufficient: these tend to be arguments that LLMs cannot do X by means Y, rather than arguments that LLMs cannot do X.
To be convincing, you have to make an argument that fundamentally differentiates your objection from past failed objections.
[1] based on METR’s research https://metr.org/blog/2025-03-19-measuring-ai-ability-to-complete-long-tasks/
We hit a bit of an inflection point starting in late November ’25 where AI systems provide decent uplift to engineers. Maybe a 5–10% uplift in total factor productivity for frontier AI research lab engineers. So that’s directly applicable to the rate of algorithmic progress, training run supervision, etc. but not overall progress (because increasing programming productivity doesn’t directly increase compute).
I expect uplift will increase superlinearly (and increasingly so) with model time horizon, and model time horizon to increase hyperexponentially (current doubling time of 2–3 months). Uplift with Opus 4.6 (released Feb 5) is probably about 20%. So, uplift will increase hyperexponentially, more than doubling every few months then weeks...
Uplift should reach a few hundred percent by Q4 this year (although uplift will obsolesce pretty quickly as a concept as models increasingly work independently of engineers). Then, conditional on limited compute not being too much of a constraint (and I don’t think it will be) we’ll get a singularity-style intelligence explosion before end of year. Timeline is pretty sensitive to the exact values of all these parameters.