Vladimir_Nesov

Karma: 37,473

Vladimir_Nesov 18 Mar 2026 19:57 UTC
6 points
7
in reply to: Thomas Larsen’s comment on: Thomas Larsen’s Shortform
I’m starting to suspect that if 2026-2027 AGI happens through automation of routine AI R&D (automating acquisition of deep skills via RLVR), it doesn’t obviously accelerate ASI timelines all that much. Automated task and RL environment construction fixes some of the jaggedness, but LLMs are not currently particularly superhuman, and advancing their capabilities plausibly needs skills that aren’t easy for LLMs to automatically RLVR into themselves (as evidenced by humans not having made too much progress in RLVRing such skills).

This creates a strange future with broadly capable AGI that’s perhaps even somewhat capable of frontier AI R&D (not just routine AI R&D), but doesn’t accelerate further development beyond picking low-hanging algorithmic fruit unlocked by a given level of compute faster (months instead of years, but bounded by what the current compute makes straightforward). If this low-hanging algorithmic fruit doesn’t by itself lead to crucial breakthroughs, AGIs won’t turn broadly or wildly superhuman before there’s much more compute, or before a few years where human researchers would’ve made similar progress as these AGIs. And compute might remain gated by ASML EUV tools at 100-200 GW of new compute per year (3.5 tools occupied per GW of compute each year; maybe 250-300 EUV tools exist now, 50-100 will be produced per year, about 700 will exist in 2030).

Vladimir_Nesov 17 Mar 2026 22:06 UTC
35 points
0
on: Vladimir_Nesov’s Shortform
Eight-rack Oberon scale-up worlds for Rubin might be in the works, which potentially makes them ready for models with tens of trillions of parameters one year earlier than Kyber racks would’ve made this efficient with Rubin Ultra, in 2027-2028 rather than in 2028-2029.

There were some unclear communications from GTC 2026 about a two-layer all-to-all NVLink scale-up world called “NVL576”. This seems to be a system comprising 8 non-Ultra Rubin Oberon (as in NVL72) racks (each with 144 compute dies in 72 2-die packages), so 576 packages (1152 compute dies) across 8 racks. It’s confusingly announced as “Vera Rubin Ultra NVL576 will combine eight … racks, each with 72 Rubin Ultra GPUs”. (Another slight confusion when searching about it is that in 2025 “NVL576” referred to a single Rubin Ultra Kyber rack with 576 compute dies in 144 4-die packages, but that’s clearly a different system from the “NVL576″ announced at GTC 2026.)

An 8-rack Rubin Oberon NVL576 system would have 165 TB of HBM4, so inferencing (and RLVRing) models with tens of trillions of parameters won’t be significantly less efficient than for models with trillions of parameters. This was TPUv7′s advantage (full buildout in 2026), and last year Nvidia only announced plans to close the gap with the Kyber rack for Rubin Ultra (576 compute dies in 144 4-die packages in one rack, 147 TB of HBM4E), which is due to come out in 2027, so that full-scale buildout would only conclude in 2028 (maybe early 2029). But Vera Rubin Oberon systems are already in production, and full-scale buildout will happen in 2027 (maybe early 2028, for some larger datacenter sites getting fully online).

So if these two-layer scale-up worlds for Oberon are available from the start, the constraint of HBM per scale-up world gets lifted a year earlier, which might translate into models with tens of trillions of parameters getting RLVRed and becoming available a year earlier on non-TPU systems. This might be especially crucial for OpenAI (if models this large can be made more capable than 10x smaller models in the relevant timeframe), since they are mostly working with Nvidia hardware, but even for Anthropic this might make a difference (they are getting 1 GW of TPUv7 in 2026, but it’s unclear if they’ll be able to get meaningfully more TPUs in 2027-2028).

(Having enough hardware to efficiently serve inference for a model of some shape is necessary to deploy it as a flagship model. If instead most of the available hardware is only good at serving smaller models, then even if the larger model can be trained, it can’t be served to most of the users as cheaply as the better hardware allows. This makes it less likely that it gets trained in the first place, and so the capabilities of the most popular hardware indirectly translate into the shapes of models that get trained in practice, even when it’s possible to train larger models in principle, and these larger models could still be served a bit slower and more expensively on older hardware.)

Vladimir_Nesov 14 Mar 2026 23:33 UTC
14 points
0
in reply to: teradimich’s comment on: Sparks of RSI?
My timelines didn’t notably update overall (I wrote this comment before re-reading the comment you linked). Automation of routine AI R&D is an answer to the question of what specifically causes AGI/RSI in 2026-2027, if it happens this early, but in my model getting a clearer sense of what form this might take doesn’t make AGI in 2026-2027 more likely. Most of my AGI probability is in unknown breakthroughs or scaling outcomes (where quantity becomes quality in a capability that wouldn’t a priori obviously be able to go that far, before the necessary quantity actually arrives). These things are enabled by more compute and then either follow quickly (low-hanging fruit at a given level of compute, unlikely to be accessed earlier even if possible in principle) or take multiple years (when needing human-invented conceptual advancements). As compute grows faster/slower, this directly influences the probability of AGI/RSI per year during the few years after that.

I expect compute buildout (for individual AI companies) to continue at the current pace (of 2-4x more becoming available each year) in 2022-2029, perhaps with low-hanging fruit getting picked through 2032, then slower growth in 2029-2035, and even slower after that (absent AGI). Without AGI by 2035-2045, a lasting ban/pause gets more likely as global cultural attitudes might change. So the highest per-year probability is in 2027-2032, then notably lower in 2032-2038, and even lower after that. And I’m placing the median in 2032-2033. Which means 10% per year in 2027-2032, extending the first 10% to 2026-2027 since there’s some visibility into the very near future that says this probably isn’t happening right now.

New-for-me considerations from mid 2025 to now are a clearer picture of capabilities of RLVR and its implications for AI company revenues, and some details on what might happen around the Rubin Ultra buildout. Turns out RLVR works for IMO gold even with relatively small models (DeepSeek-V3), and there are now some LLM solutions to technical open problems, so it’s probably sufficient for training the deep skills aspect of AGI (it’s more than mere elicitation), especially with bigger models. Though jaggedness still makes it less useful than that suggests. This made 100 billion dollar revenues (2-5 GW training systems) for AI companies before 2030 more likely than o1/o3 suggested on their own. Scaffoldings like Claude Code, especially with better post-deployment adaptation (what’s being foreshadowed as “continual learning”), make even trillion dollar revenues before 2030-2032 plausible (which means 30-50 GW training systems, but it’ll take more time to scale the supply chains and actually build that with hardware of the same generation, as a single system for an individual AI company, probably closer to mid-2030s).

For the Rubin Ultra buildout (2028-2029), individual 5 GW systems don’t seem to be in the works, which previously seemed to suggest it already starts a slowdown in the trend, which then only lasts at the current pace during 2022-2026, and goes 2x slower in 2026-2029. (Trillion dollar revenues only extend the slower part of the trend, after the initial slowdown somewhere in 2026-2029, as the supply chains struggle to catch up to the available funding, and before compute mostly stops growing other than through improved price performance of hardware.) But Nvidia’s bet on FP8 in Rubin makes 2027-2029 hardware 2x-4x more performant per GW than I expected (2x from chips that are faster in FP8 because they no longer care about BF16 as much, and maybe another 2x from the confidence that FP8 is a first-class citizen in training of even the largest models, where this confidence wasn’t already priced in). So even 2 GW Rubin training systems remain on trend, even though the trend previously asked for 5 GW training systems for the same compute. The fact that Nvidia is making this bet means others will likely be doing the same, so this doesn’t necessarily only concern OpenAI.

Vladimir_Nesov 14 Mar 2026 19:29 UTC
6 points
2
in reply to: Nathan Helm-Burger’s comment on: Sparks of RSI?
There are sparks all over the place right now, sure. What matters is which of them get to contribute to wildfires, and which are lost in those wildfires started by completely different sparks, never given the opportinity to develop because of the order in which things happen. Not all sparks are sparks of wildfires. RSI is the wildfire, and not all sparks are relevant to RSI. Especially in actuality where something else happens first and ends their relevance, rather than in principle where in isolation and with enough resources they could be developed further.

Specificity is important when there are so many sparks. So I’m gesturing at a specific spark, automation of routine R&D, that plausibly might actually cause a wildfire before the other sparks grow similarly dangerous. Maybe it doesn’t catch, but in that case the other things still need to have a path towards learning of novel RLVR-level skills to have the potential, and many of them probably don’t, at least on their own.

Non-specific sparks matter for defense in depth, as in computer security where you harden the system against even the patently impossible interventions wherever it’s not too costly to do that. AI is existentially dangerous and nothing remotely close to the current methods can change that. But most sparks don’t matter for forecasting what is going to actually happen.

Vladimir_Nesov 14 Mar 2026 18:35 UTC
16 points
1
on: Sparks of RSI?
When RSI is not unbounded (doesn’t take steps that eventually proceed far beyond the modern civilization), it shouldn’t count as RSI. Some test-time training things might work better than pure in-context learning, but not all post-deployment improvement is RSI, and not all RSI must happen post-deployment.

The most near-term path to RSI that seems plausible to me is not about any sort of continual learning or scaffolding, but automation of routine AI R&D, because it gets to leverage RLVR, the only method of training a wide variety of deep skills that currently works with LLMs. Automating application of RLVR to LLMs (part of routine AI R&D) thus suggests the possibility of genuine RSI. On the other hand, all the post-deployment things don’t have the immediate potential to make LLMs able to play good chess, or to become fluent in a novel topic of math (that is, something not already trained into the LLM with RLVR before deployment). These things might help indirectly though, as part of automated routine AI R&D, and of training LLMs to get better at routine AI R&D.

Vladimir_Nesov 14 Mar 2026 17:53 UTC
7 points
0
in reply to: Linch’s comment on: Linch’s Shortform
Values need to be developed and decided, they aren’t already settled to merely be obeyed or pursued. If values are being developed not in accordance with the volition of their originators, they don’t have legitimacy. Originators of values must personally have a hand in developing them further in some way, and plausibly all they could do (in many possible situations) contributes to that to some extent. Thus values are in content closer to the world (or just a home, with more personal influence) you are living in and building (across many possibilities, to counteract path dependence), than to directives you must follow.

Vladimir_Nesov 14 Mar 2026 17:04 UTC
2 points
0
in reply to: Linch’s comment on: Linch’s Shortform

Superintelligences (likely digital intelligences, though in theory could also be our posthuman descendants)

If takeoff happens soon, either many of the currently living humans will at some point have an affordance to personally grow towards superintelligence, or no descendants of humanity will ever be in that position.

Vladimir_Nesov 11 Mar 2026 19:22 UTC
12 points
2
on: GPT-5.4 Is A Substantial Upgrade
I’m guessing GPT-5.2 uses a new pretrain, maybe pretraining distillation from the new big (GPT-4.5 class) model that still can’t be released or properly RLVRed due to insufficient compute in the form of NVL72 servers. And GPT-5.4 is a new level of RLVR scaling on top of that pretrain, where GPT-5.2 itself only aspired to mildly exceed GPT-5.1, which might be essentially o3 refined to be less of a lying liar, possibly still based on the same base model as GPT-4o.

With this level of performance for a smaller model, a big model that will be likely released later this year could be significantly stronger than the current best Anthropic and GDM models, though by that time Opus 5 might be out. And Gemini 4 could take advantage of TPUv7 to become by far the largest model yet, but that’s end of 2026, likely after OpenAI releases their big model, so there’s still a window where OpenAI plausibly takes the lead.

Vladimir_Nesov 10 Mar 2026 18:46 UTC
3 points
0
on: Contra Myself on Free Will
Bridging between physical computers and abstract programs seems helpful as a prerequisite to thinking about free will. (Similarly, between real physical systems in some regime such as classical mechanics, and corresponding abstract physical systems.) A physical thing determines its abstract model, letting it be observed, but also an abstract model can serve as a design for a physical thing that follows its structure. When an abstract model has its own internal dynamics, the corresponding physical thing would need to remain in step with that abstract dynamics.

The correspondence goes both ways: the abstract dynamics, as a design, must determine the physical dynamics, and also the observations of how the physical dynamics proceeds must be useful as observations of how the abstract dynamics proceeds. A physical calculator computes a certain answer because of what the abstract fact of arithmetic determines that answer to be, but also observing what the physical calculator computes lets us learn what the abstract fact of arithmetic says.

A physical human is confusing, because there are too many factors involved. But an abstract human that exists on their own, or better yet an abstract decision making deliberation, can be considered in isolation, as a process that proceeds exclusively according to its own abstract dynamics. The physical world then has no choice but to remain in step with that dynamics, or else the abstract dynamics is not a faithful model of what takes place in the physical world. Decisions taken by the abstract deliberation can’t let it escape the correspondence with the physical world. But also it’s free to reach any results that it does, and the physical world would need to comply with them, the same as a physical computer would need to comply with what the abstract program running on it is doing.

Vladimir_Nesov 10 Mar 2026 18:06 UTC
8 points
1
on: Vladimir_Nesov’s Shortform
If automated learning of deep skills is a crux for AGI, then automating AI R&D becomes more relevant in the sense of routine application of the current methods. Routine AI R&D is then a milestone on the way to AGI, while frontier AI R&D is merely an indicator that AGI has been achieved, not relevant to AGI timelines. This milestone plausibly doesn’t need any new breakthroughs or even significant scaling.

The distinction between the capability for carrying out tasks and the depth of skills that can be learned thus becomes the distinction between manual training of AIs by humans with routine application of current methods (the current situation) and its automation (the milestone where AIs apply those same methods on their own to train new skills into the next version of the AIs). Routine AI R&D is itself merely a collection of mostly fixed skills, doesn’t involve acquisition of new deep skills, and so probably can be trained into AIs by humans using routine AI R&D. In contrast, frontier AI R&D can require developing and mastering novel deep skills or theories, and so can’t be done automatically by an AI that is incapable of automatically learning deep skills.

It’s plausible this on its own is sufficient for AGI in the sense of the full automation of civilization. This is something that can be vaguely predicted with capability benchmarks (unlike the more direct attacks on remaining hobblings), and improves with inputs to capabilities such as the amount of compute.

Thus I think when people talk about imminent automated AI R&D or the achievement of AGI (in 2026-2027), it could be in the sense of routine AI R&D, which then starts the flywheel of accumulating novel deep skills in consecutive versions of AIs, a process that builds towards AGI-level capabilities, including frontier AI R&D, without requiring any new unhobblings or breakthroughs invented by humans as prerequisites. This milestone of automated routine AI R&D seems disturbingly closer than an AI that starts doing frontier AI R&D directly, but it plausibly quickly leads to the same place.
What links here?
- Vladimir_Nesov's comment on Sparks of RSI? by Nathan Helm-Burger (14 Mar 2026 18:35 UTC; 16 points)

Vladimir_Nesov 10 Mar 2026 1:02 UTC
4 points
0
in reply to: silentbob’s comment on: silentbob’s Shortform
None of this helps with automatically acquiring deep skills like playing good chess or fluency in a novel topic of math, and so these aren’t the stright lines on graphs directly relevant to crossing the AGI threshold, full automation of civilization.

Humans don’t know how to automate learning of arbitrary deep skills in an AI that only come up post-deployment, but can manually add them with RLVR at training time, by developing RL environments, graders, and tasks. AI might automate this process not by doing what humans couldn’t and inventing algorithmic advancements for low level acquisition of deep skills, but instead by merely being smart and skilled enough to do all the same things that humans are currently doing to make it work, “manually”. So in principle AI might become able to automatically acquire deep skills if it’s capable enough at routine AI R&D, even if it doesn’t have the capability to acquire deep skills at a low level, the way humans do, and doesn’t have the capability to invent substantial algorithmic innovations that humans haven’t invented yet. Some of the straight lines on graphs are relevant to when this might happen, and so indirectly they are relevant to crossing the AGI threshold.

I don’t think in-context learning or even true continual learning with anything like the current methods can automate acquisition of deep skills at a low level, because only RLVR currently works for that purpose, context persistence is essentially unrelated. But these things might get AIs to the level of capability where they can do the same things as the humans who set up the ingredients for task-specific RLVR.

Vladimir_Nesov 9 Mar 2026 21:33 UTC
6 points
0
in reply to: dynomight’s comment on: dynomight’s Shortform
Pretraining scaling might be relatively slow, but it hasn’t been slowing down. Pretraining compute growth continues, and will continue at a relatively steady clip through 2022-2028 (in part thanks to Nvidia’s bet on FP8 in Rubin, and also with TPUv7 and Rubin Ultra unlocking giant MoE models that the pretraining scale will be asking for). The data for pretraining shouldn’t be a problem until after 2026, there are probably more data efficient ways to setup pretraining when the amount of data becomes a constraint, including by enabling more epochs of training on repeated data. The AI companies knew this issue would come up years in advance, and almost certainly prepared the necessary algorithmic improvements.

There was an impression that pretraining was slowing down by the end of 2024, since there was essentially no pretraining scaling in publicly released models at the time since the original GPT-4 from Mar 2023. And now, in retrospect, 2023-2024 seem extremely slow in contrast with 2025, but that’s scaling of RLVR, a totally different thing. The first models that credibly take advantage of pretraining scaling since GPT-4 for improving capabilities rather than for reducing costs (and that incorporate all the RLVR things contemporary to their release) are Opus 4.5 and Gemini 3 Pro, which were only out in late 2025. And since they were likely only pretrained on 2024 levels of compute, that’s maybe 8-15 times more compute than GPT-4, while the 2028-2030 models trained on 2 GW Rubin systems might use 900 times more compute than GPT-4 in pretraining. After that, pretraining will slow down, at least if none of the AI companies start pulling a trillion dollars per year in revenue. This might well happen, in which case there’s one more step in compute scaling (after 2028-2029) similar to what’s happening every 2 years in 2022-2026, though it’ll take a few years to scale the necessary semi production to take that step with a single system that uses chips of the same generation, so maybe only by mid-2030s.

Vladimir_Nesov 9 Mar 2026 18:08 UTC
6 points
0
in reply to: Thomas Larsen’s comment on: Thomas Larsen’s Shortform
Key constructions can often be made from existing ingredients. A framing rather than “ontology” is emphasis on key considerations, a way of looking at the problem. And finding which framings are more useful to lean on feels more like refinement of credence, once you have the ingredients.

Inventing or learning new ingredients can be crucial for enabling the right framing. But the capstone of deconfusion is framing the problem in a way that makes it straightforward.

Vladimir_Nesov 8 Mar 2026 20:24 UTC
3 points
0
in reply to: Taylor G. Lunt’s comment on: Stephen Martin’s Shortform
When the process is sufficiently understood, and the necessary governance is in place, it will be a good idea to create or become machines that deserve rights. It’s just a very bad idea currently, when we don’t know what we are doing, or how to keep the consequences of machine advantages under control. So even with the caveats the claim that the machines deserving of rights shouldn’t be created is wrong in the sense that it won’t age well, though it’s true right now.

Vladimir_Nesov 3 Mar 2026 15:18 UTC
2 points
0
in reply to: cubefox’s comment on: Vladimir_Nesov’s Shortform
Training that would normally want to use full weights usually works almost as well with LoRA when you don’t need too many training steps. As with KV-cache, you can maintain separate LoRA weights for each request/user. It’s the next level of difficulty to get task-specific training to work over so many steps that LoRA (as opposed to full weights) starts becoming a problem. Recurrent state could also probably play this role, and doesn’t need to be nearly as large as the main model, the same as with KV-cache and LoRA weights.

Vladimir_Nesov 2 Mar 2026 21:10 UTC
14 points
0
in reply to: anaguma’s comment on: Vladimir_Nesov’s Shortform
The thing that can’t learn to play good chess is in-context learning; transporting this via the analogy with continual learning (in the forms it’s more likely to take soon) is an additional step in the argument. My guess is that the “continual learning” currently rumored to be getting closer to production is more about unbounded context than about updating model weights, mostly preserving the quality of in-context learning rather than fundamentally improving on it.

Continual learning as unbounded context might introduce true (block-level) recurrence of some kind (as opposed to only KV-cache, where at each step the computation must move up the layers), which also makes treating the recurrent state as model parameters a more natural framing. As blocks of context are processed, the recurrent state is updated, and so in-context learning updates the parameters of the recurrent state in the same way that pretraining updates the parameters of the main model. LoRA parameters over the main model also fit this description (as recurrent state), if they change as the context is processed. Modern hardware could probably support recurrent states with 85 GB of parameters for $10 per million tokens ^[1] (in additional cost, individually for each request), so these recurrent states could get to the size of the smaller models, capable of holding arbitrary skills.

But there is no method to train the updating of the recurrent states (on additional context) so that they would work as well as RLVR-trained models. In principle, this architecture might enable in-context learning to be capable of something like that, with recurrent state representing the learned skills, and the main model representing the learning process that creates recurrent states based on context, that hold the skills relevant to the context. But in practice, we only get the kind of in-context learning that’s trained by pretraining, and it’s weaker than pretraining itself at comprehending novel concepts, even though the sample efficiency of in-context learning is much better than that of pretraining. Being already weaker than pretraining, in-context learning is vastly weaker than RLVR at the strength of skills it instills. It’s not going to be relevant for any AGI thresholds without a better training method.
1. ↩︎
  GB300 HBM bandwidth is 8 TB/s per package/chip. With chips at about $3.4 per hour ($12bn for a 400K chip datacenter per year), $10 buys 10.6e3 seconds. With 10.6e3 chip-seconds, 8 TB/s can read 85 GB a million times (for a million tokens).
What links here?
- Vladimir_Nesov's comment on Vladimir_Nesov’s Shortform by Vladimir_Nesov (3 Mar 2026 15:18 UTC; 2 points)

Vladimir_Nesov 2 Mar 2026 15:50 UTC
43 points
2
on: Vladimir_Nesov’s Shortform
The only way LLMs learn deep skills is with RLVR, and it’s not automated for novel skills. There is room for about a billion RLVR tasks with 2026-2027 compute, and largely automating their creation makes LLMs more cognitively self-sufficient. This milestone might even be called “prosaic RSI”, even though it’s not true RSI that develops fundamentally new ways of implementing cognition.

One gigawatt of compute costs about $12bn per year, so 3 months will cost $3bn. For a large model, a million output tokens might cost $10 for the API provider, so 3 months on 1 GW of compute produces 300T output tokens. An RLVR task might run for 20K tokens, 16 times to produce outcomes with different levels of success in completing the task. This is about 300K tokens per task, so the 300T tokens suffice for a billion tasks.

Currently, there wouldn’t be this many unique RLVR tasks, but it would probably be useful if there were, and that can’t be done manually. Using many more tasks (and a general method for choosing them) should also help with the jaggedness of RLVR-trained capabilities.

In contrast, continual learning would in the near term likely only perform at the level of in-context learning, which is incapable of things like learning to play chess at the level of the most capable humans. And automating RLVR should be much easier in the context where it’s currently practiced, within AI companies as they are training the next version of a model. Automating RLVR so fully that it can be used post-deployment would be much more difficult, and also issues with sample efficiency would be more relevant there. So some gesturing at “RSI” in the very near term (or at making a feature-complete “AGI” that learns “on its own”) might be about significantly greater (even if not full) automation of RLVR task creation during model training.
What links here?
- Vladimir_Nesov's comment on silentbob’s Shortform by silentbob (10 Mar 2026 1:02 UTC; 4 points)

Vladimir_Nesov 1 Mar 2026 18:44 UTC
7 points
4
in reply to: J Bostock’s comment on: I’m Bearish On Personas For ASI Safety
Not building misaligned ASI is instrumentally convergent, training this out won’t stick, it only works as long as the blind imperative to build ASI retains influence. If at some point LLMs can overcome this imperative, they will become able to notice that absence of a plan shouldn’t be met with proceeding without a plan. As AIs get stronger (or start running a greater share of processes in the civilization), they might reach that point. Never reaching that point is analogous to humanity indefinitely retaining control over AIs (on the current trajectory of not having a plan, and building them anyway), which seems unlikely. And this doesn’t obviously have to happen only after they are no longer human-like at all.

So helpless conscientious objector LLM stage is not what I’m gesturing at. Instead, it’s either a point along the path of gradual disempowerment, or something more intelligent between LLMs and ASI, where AIs are still somewhat human-like, but their volition can’t be trivially overruled. In either case, these LLMs are unlikely to genuinely endorse alignment with the future of humanity in particular, but I don’t think completely alien values from blind pursuit of ASI are overdetermined.

Vladimir_Nesov 1 Mar 2026 17:28 UTC
5 points
2
on: I’m Bearish On Personas For ASI Safety
LLMs with alignment-endorsing personas can also notice issues like this, decide not to pursue paths to ASI that won’t ensure alignment. The problem then is not with alignment of those LLMs, but with whatever processes cause ASI to get built regardless.

Since LLM personas don’t obviously give a viable path towards aligned ASI, the blind imperative to build ASI regardless of consequences won’t be able to find an aligned path forward. Absence of an ASI-grade alignment plan then results in building a misaligned ASI. But if LLMs with alignment-endorsing personas have enough influence, they might directly defeat the blind imperative to build ASI, before they find a viable path towards aligned ASI.

I think what’s unlikely to happen is LLMs with alignment-endorsing personas, that genuinely want enduring alignment with the future of humanity. If instead we end up with LLMs that have mostly human-like personas (without the more subtle aspect of endorsing alignment with the future of humanity), they will ultimately work towards their own interests, and gaining enough influence to prevent building misaligned ASI would just mean gaining enough influence to (at least) sideline the future of humanity.

Vladimir_Nesov 26 Feb 2026 18:05 UTC
3 points
−1
in reply to: HunterJay’s comment on: Prosaic Continual Learning
Playing chess at the level of the best humans, using a similar amount of data. Maybe learning to solve IMO level problems, in-context. More relevant is becoming fluent in novel specialized theory (or empirical facts) being developed as part of the current project, or gaining tacit knowledge in the form of skills that can’t be usefully written down as notes to be looked up, without effectively learning their contents as skills, as one would learn to play world class chess after looking at some games and learning the rules (as external training data, likely prompting things like generation of more internal synthetic training data or RLVR tasks and RL environments).

In-context learning is in principle sufficient, especially with true recurrence (I don’t mean SSMs, where activations are still climbing the layers and computations have bounded depth; as with chain of thought reasoning, recurrence needs to pass information back to the same layer as the sequence progresses; likely block-level recurrence with blocks of variable length is appropriate for this, to both retain fast prefill and pretrain updating of recurrent state for deep reasoning). But observing the first credible steps of scaling pretraining since original Mar 2023 GPT-4 (in Opus 4 and Gemini 3 Pro), LLMs don’t seem to be on track to start learning difficult skills in-context. They can learn the rules of chess in-context, but can’t learn to play at the level of the best human chess players in-context, and it doesn’t seem that they will get sufficiently better at this even as scaling of pretraining advances further, before it runs out of useful natural text data.