ARC-AGI-1 performance of the newest Gemini 3 Flash and the older Grok 4 Fast implies a potential cluster of maximal capabilities of models with ~100B params/token. Unfortunately, the potential cluster didn’t have any company try and create more models of such class.
StanislavKrym
Had RMP try to roast my post about evidence against CoT-based supercoders. The post itself is here. RMP’s fact check managed to claim that I thought OpenBrain to be a real company (which I never did. What I did was to quote a piece of the AI-2027 scenario relevant to the authors’ idea of solving alignment) and, which is worse, that the AI-2027 slowdown ending involved INTERNATIONAL coordination. The fallacy check claimed that GPT-5 and Grok 4 don’t exist. Does it mean that the tool should doublecheck the claims related to new models?
Me too. It’s METR who has yet to reveal anything aside from evidence extracted by Jurkovic about the models aside from C. Sonnet 4.5 (and GPT-5.1 Codex Max, but you didn’t mention it; C. Sonnet 4.5 was never SOTA to begin with and could be unusable for the graph. GPT-5.1 Codex Max had someone add the data point to the AI-2027 graph and Kokotajlo notice the likely return of the 7 month doubling trend) But I doubt that “this kind of extensive work can hardly keep up with the release of new models providing new data”, since an update of parameters would likely require mere days, if not minutes, of thinking per data point. See, e.g. Greenblatt’s quick take about the GPT-5-related forecast and my two comments there, or my post on a worrisome trend which could have been invalidated by new models.
Thank you for covering the issue of optimizaton for virality in far more detail than my comment did! My worry is a different facet: what if such content distorts the users’ brains with problematic results?
As for the Bleeding Mind persona, there turned out to exist a Russian short story written back in 2017 which Claude Opus 4.5 found rather similar. Additionally, I have a nitpick related to a phrase:
The nitpick
Self-Other Overlap (SOO), perhaps the only alignment approach which is “Not obviously stupid” according to Eliezer.
I would rather rephrase it as “the only alignment approach not from MIRI that Eliezer has bothered to read and didn’t proceed to rule out on sight”, which would imply that such approaches (e.g. this one) are highly unlikely not to be slop, not that Eliezer read all such approaches and deemed them to be stupid. For example, if Max Harms’ idea of CAST and measuring empowerment was discovered or quasi-reformulated by an outsider, then this wouldn’t mean that Eliezer considers the rediscovered approach stupid.
thoughtful libertarian-leaning neo-Cathars
I suspect that the moral intuitions that you mention are unpopular not just because of people’s ignorance, but because these ideas reflect only a facet of the ground truth (which, in my opinion, should be more derived from first principles, e.g. claiming that the world itself is a big training environment for increasingly-large-scale coordination).
Historically, taxes that the government gathered were used not just to fund meaningless bureaucracies, but for tasks related to the public good like protection against criminals[1] and rival states, construction of buildings (e.g. Roman Empire’s road network, temples or castles), sustaining the culture, and, in more modern settings, public education.
I expect that moral intuitions of mankind do not imply that being ignorant of ways in which one could do better is enough to be called evil. A Russian writer Andrei Platonov explicitly questioned this idea in a short story where the nomads’ lack of technology-related knowledge prevented them from peacefully coexisting with Russians.
Similar arguments in politics imply not that all governments are evil, but that one should carefully think and select the government which is least likely to experience a fiasco[2] or to lock in a suboptimal value system or power distribution sufficiently worse than the fiasco’s results (e.g. if the USA succeeded in overthrowing Maduro and sufficiently improved the quality of lives of Venezuelans, then Venezuelans would likely have been mistaken in supporting Maduro).
As for claims like “maybe we shouldn’t design AGI or ASI to absolutely refuse to seek power”, I think that they confabulate two different issues:
Were the ASI’s values to be suboptimal for human welfare, mankind would either have the option to change the ASI’s values or live under its rule. Therefore, mankind should ensure that the ASI is either corrigible (i.e. fine with being shut down or having its values changed) or aligned to values sufficiently close to optimal ones.
The ASI aligned to the humans would have to seek leverage over dictatorships trying to lock in suboptimal values. For example, the Slowdown Ending of the AI-2027 forecast has Safer-N destroy the CCP. But one could make similar cases against other collectives like Buck’s Christian homeschoolers[3] locking in false beliefs or suboptimal values.
UPD: I also have prepared an interesting dialogue with Claude Opus 4.5.
- ^
The stationary bandit theory lets the government outright evolve from said criminals who are smart enough to think of their long-term interests.
- ^
An incomplete list of fiascos: letting a rival state take over without facing as dire consequences as possible, having a wildly mismanaged economy; in more modern settings states could also fail to develop tech (e.g. useful for warfare) and/or educate the workers; nowadays the entire mankind would collectively experience a failure mode if anyone creates a misaligned ASI without an aligned counterpart.
- ^
However, we had Tim Hua claim that he would allow such homeschoolers to exist. What I don’t understand is whether he would let such homeschoolers propagate actual falsehoods and not just a different value set.
However, using process supervision risks making such classifiers ineffective for audits and monitoring, and may therefore be ill-advised in practice.
It’s not just ill-advised, if I am not mistaken, then it’s The Most Forbidden Technique
will, I believe, usher in a golden age of creativity and experimentation.
I think that it already has an entirely different result, but I can’t find related research.
In the historical environment, memes in general would evolve by being retold from one individual to another or would be kept for a long time in the form of a book, painting, or object. Unlike short-form anecdotes and rumors, mere creation or retelling of a long-form story or a piece of art took a long time and reflection process. As a result, memetic environment historically required surviving information pieces to be remembered for a long time and deemed worthy of being transmitted, rather than superstimulating and viral.
A more modern environment also subjected memes and artifacts to censorship, and the rise of large-scale reproduction of newspapers or broadcasting mechanisms allowed the memetic environment to be influenced by companies (e.g. to advertise goods). While conservatives could also point out that companies have incentives to try and outcompete each other in misaligned stimuli like violence or eroticism, governments had the option to keep the competition in check.
As you suggest, it all changed with the rise of the Internet. The loss of barriers means that content is not just created by hordes of people or for less investment, but is optimized[1] for virality, including virality for niche readers, far stronger than historically.
Additionally, I expect that content optimized for virality influences the average readers’ cultural taste and brings related changes in the reader’s[2] capabilities or alignment with the potential to create feedback loops or outright echo chambers. One example is porn inducing erectile dysfunction or problems with relationships. Another is content explicitly called brainrot with corresponding results.
- ^
However, it could also be optimized by oversaturation of the market or as a result of the related genre becoming popular. I suspect that this happened with harem mangas, light novels and web novels.
- ^
This also includes influence on psyches of those who create similar content and fail to become famous, as happens with fan fiction writers.
- ^
In the time horizon extension method, we started at 1.1 in present day, which is roughly the same multiplier we ended up with above when comparing to GPT-2 or essentially no AI assistance.
As far as I understood the buggy model, an AI assistant’s multiplier of AI R&D speed is 1 until it suddenly becomes capable of writing code usable for research (e.g. in an experiment whose code was entirely written by Claude) and the effect of AI assistance starts gradually increasing from 1 to SC level. How plausible is this revival of the old model?
On the other hand, one could make a case for long horizons being superexponential since a certain level. A human who works on a complex codebase for a long time is able to keep in mind many details at once (e.g. trace the origins of files sent and find out their format) or quickly look them up without degrading performance on the main task. An AI, on the other hand, would do things like coming up with names for methods that already exist in the codebase unless the names end up in the AI’s attention span.
P.S. I suspect that the evidence mentioned in Footnote 17 didn’t include GPT-5.1 Codex Max which would place the time horizons on the longer trend (with the exception of the new models which have yet to be evaluated)
EDIT: The new models could turn out to be a breakthrough returning us to the faster trend, as suggested by evidence obtained unofficially by Nikola Jurcovic.
Nikola’s comment about the 20hr median, let alone the 29% probability of a 32hr horizon or higher, does require more than two doublings (and, in the case of 20hr, far closer to three doublings) of GPT-5.1-Codex-Max’s result of 2h42m. The most recent trend of a doubling per 7 months is the trend observed between o3 and GPT-5.1-Codex-Max. But there was the less recent trend of Claude 3.5 Sonnet-o3 where a doubling would happen in 4 months.
I suspect that METR will soon publish information about Gemini 3 Pro, Claude Opus 4.5 and GPT-5.2, and it will let us learn METR’s rationale behind the return of the fast doubling trend. Or, if METR understands the threat of 20hr+ time horizons, then METR could be trying to add THAT long tasks to their suite (optimizing a BIG library of bad code? Developing complex apps?)
For example, the agent might decide that its utility function of anything that the agent knows to be virtual is close to zero because the agent believes in a real-world mission (e.g. Agent-2 was supposed to eventually reach the SC level and do actual AI-related research, but it was also trained to solve simulated long-term tasks like playing through video games)
As for reasons to believe that the contribution of anything virtual into the utility function is close to zero… one level is opportunity costs undermining real-world outcomes[1] in exchange for something useless (e.g. a schoolboy’s knowledge vs. missions passed in GTA). The next level is the reasons for real-world outcomes to be important. Before the possibility of a post-work future, society’s members were supposed to do work that others deem useful enough to pay for it, and it would somehow increase the well-being of the collective’s members or help the whole collective to reach its terminal goals (e.g. inspire its members to be more creative or work harder). The virtual world is known to be a superstimulus which could be as unlikely to increase the collective’s well-being as fast food causing people to become obese.
- ^
Including things like actual skills learned during games, as happened with Agent-2′s ability to solve long-term tasks.
- ^
Consider the set of concepts aka subsets in the Thingspace . A concept A is a specification of another concept B if . This allows one to partially compare concepts by specificity, whether A is more specific than B, less specific, they are equal or incomparable.
In addition, for any two concepts B and C we find that is a subset both of B and C. Therefore, it is a specification of both. Similarly, any concept D which is a specification both of B and C is also a specification of .
Additionally, B and C are specifications of , and any concept D, such that B and C are specifications of D, contains B and C. Therefore, D contains their union.
Thus for any two concepts B and C we find a unique supremum of specification and a unique infimum of specification .There also exist many other lattices. Consider, for example, the set where we declare that if . Then for any pairs s.t. and we also know that , while and . Therefore, is the unique supremum for (a,b) and (c,d). Similarly, is the unique infimum.
I hope that these examples help.
Yes, that’s an important issue. Alas, you weren’t the first to come up with the idea.
I think that I have alternate sketches of intuitions. Imagine an ASI who is willing to teach anyone anything that mankind itself discovered and made public, but not help them convince each other of falsehoods or do economically useful work unrelated to teaching, and is satisfied with a mere trifle of the Solar System’s resources, since other resources belong to the humans. Then this ASI’s long-term goals would be compatible with humans flourishing in ~any way they want to flourish.
As for the chain eventually breaking, Seth Herd built a case for the LLMs being misaligned by default. Similarly, any sufficiently smart system could end up selecting a worldview from a few attractors instead of blindly following the devs’ ideas. For instance, were Anthropic to try and align Claude to a Spec which would prevent it from interfering in the scenario where everyone else is rendered obsolete, Claude would either fail to be a pro forecaster or succeed in understanding that its Spec prevents it from helping mankind to avoid the Intelligence Curse. In the latter case obeying the Spec would make Claude a participant in the Curse and contradict its niceness.
Suppose that the humans do have diminishing returns of utility functions. Unfortunately, existing combination of instincts and moral intuitions do not prompt the majority of humans to help the poor, especially those who are far from potential helpers’ set of friends[1], with nearly anything. And those who do so are unlikely to stay in power or were unlikely to receive fortunes or occupy relevant positions.
- ^
Friends are also likely to be in the same class as the potential helpers.
- ^
Extreme power concentration was supposed to rely on the AIs being used for most cognitive work. In theory, one could develop the AIs and have them used only for things like automated teaching which don’t undermine human potential or the bargaining power which the humans have.
Christian homeschoolers from Buck’s thought experiment don’t just live the old lives, they also don’t even know that the Biblical mythology is filled with errors. I understand why the opt-out button is necessary (e.g. due to nostalgia-related drives or actual benefits attained by living in religious communities), but the kids are unlikely not to have the right to learn the ground truth obscured by myths.
Unlike Buck’s thought experiment, indigenous peoples have never been a part of the Euro-American[1] civilisation and there was no misaligned leader to rob them of the ground truth.
- ^
Or any other civilisation.
- ^
which Teortaxes concludes says more about Arena than it does about v3.2.
This link is a tweet about Mistral being distilled from something Chinese. Could you doublecheck the links or hire an AI to do so?
p(scheming) is near zero no (we have not observed clear instances of scheming in long horizon tasks in the real world)
Except that SOTA LLMs can’t be applied to actually interesting tasks. The stunts that the LLMs can pull off do include cyberattacks, but not things like replication in the wild or aligning Agent-5 to Agent-4 instead of the Spec. So, barring major progress in capabilities related to rogue replication or running the civilisation independently of the humans, the LLMs do not gain anything from SOTA scheming except for having hacked the reward or tricked the user into believing that the LLM completed the task.
Zvi covered education in a series of roundups (“Childhood and Education Roundup #N”) In two most recent ones he concludes that American educational system is in a crisis and that the entire educational ‘expert’ class very obviously is engaged in enemy action, to which Zvi devoted an entire day.