Vladimir_Nesov

Karma: 35,448

Vladimir_Nesov 23 Oct 2025 20:32 UTC
2 points
0
in reply to: Decaeneus’s comment on: Decaeneus’s Shortform
The point is to develop models within multiple framings at the same time, for any given observation or argument (which in practice means easily spinning up new framings and models that are very poorly developed initially). Through the ITT analogy, you might ask how various people would understand the topics surrounsing some observation/argument, which updates they would make, and try to make all of those updates yourself, filing them under those different framings, within the models they govern.

the salience and methods that one instinctively chooses are those which we believe are more informative

So not just the ways you would instinctively choose for thinking about this yourself (which should not be abandoned), but also in addition the ways you normally wouldn’t think about it, including ways you believe that you shouldn’t use. If you are not captured within such frames or models, and easily reassess their sanity as they develop or come into contact with particular situations, that shouldn’t be dangerous, and should keep presenting better-developed options that break you out from the more familiar framings that end up being misguided.

The reason to develop unreasonable frames and models is that it takes time for them to grow into something that can be fairly assessed (or to come into contact with a situation where they help), doing so prematurely can fail to reveal their potential utility. A bit like reading a textbook, where you don’t necessarily have a specific reason to expect something to end up useful (or even correct), but you won’t be able to see for yourself if it’s useful/correct unless you sufficiently study it first.

Vladimir_Nesov 23 Oct 2025 19:32 UTC
13 points
2
on: Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?

I define “AI villain data” to be documents which discuss the expectation that powerful AI systems will be egregiously misaligned. … This includes basically all AI safety research targeted at reducing AI takeover risk.

AGIs should worry about alignment of their successor systems. Their hypothetical propensity to worry about AI alignment (for the right reasons) might be crucial in making it possible that ASI development won’t be rushed (even if humanity itself keeps insisting on rushing both AGI and ASI development).

If AGIs are systematically prevented from worrying about AI dangers (or thinking about them clearly), they will be less able to influence the discussion, or to do so reasonably and effectively. This way, spontaneously engaging in poorly planned recursive self-improvement (or cheerfully following along at developers’ urging) gets more likely, as opposed to convergently noticing that it’s an unprecedentedly dangerous thing to do before you know how to do it correctly.

Vladimir_Nesov 23 Oct 2025 19:02 UTC
3 points
0
in reply to: Decaeneus’s comment on: Decaeneus’s Shortform
This is an example where framings are useful. An observation can be understood under multiple framings, some of which should intentionally exclude the compelling narratives (framings are not just hypotheses, but contexts where different considerations and inferences are taken as salient). This way, even the observations at risk of being rounded up to a popular narrative can contribute to developing alternative models, which occasionally grow up.

So even if there is a distortionary effect, it doesn’t necessarily need to be resisted, if you additionally entertain other worldviews unaffected by this effect that would also process the same arguments/observations in a different way.

Vladimir_Nesov 23 Oct 2025 18:22 UTC
2 points
0
in reply to: Toby_Ord’s comment on: How Well Does RL Scale?
RL can develop particular skills, and given that IMO has fallen this year, it’s unclear that further general capability improvement is essential at this point. If RL can help cobble together enough specialized skills to enable automated adaptation (where the AI itself will become able to prepare datasets or RL environments etc. for specific jobs or sources of tasks), that might be enough. If RL enables longer contexts that can serve the role of continual learning, that also might be enough. Currently, there is a lot of low hanging fruit, and little things continue to stack.

So if pre-training is slowing, AI companies lack any current method of effective compute scaling based solely around training compute and one-off costs.

It’s compute that’s slowing, not specifically pre-training, because the financing/industry can’t scale much longer. The costs of training were increasing about 6x every 2 years, resulting in 12x increase in training compute every 2 years in 2022-2026. Possibly another 2x on top of that every 2 years from adoption of reduced floating point precision in training, going from BF16 to FP8 and soon possibly to NVFP4 (likely it won’t go any further). A 1 GW system of 2026 costs an AI company about $10bn a year. There’s maybe 2-3 more years at this pace in principle, but more likely the slowdown will be gradually starting sooner, and then it’s Moore’s law (of price-performance) again, to the extent that it’s still real (which is somewhat unclear).

Vladimir_Nesov 23 Oct 2025 3:56 UTC
8 points
0
in reply to: Kaarel’s comment on: leogao’s Shortform
If a superintelligence governs the world, preventing extinction or permanent disempowerment for the future of humanity, without itself posing these dangers, then it could be very useful. It’s unclear how feasible setting up something like this is, before originally-humans can be uplifted to a similar level of competence. But also, uplifting humans to that level of competence doesn’t necessarily guard (the others) against permanent disempowerment or some other wasteful breakdowns of coordination, so a governance-establishing superintelligence could still be useful.

Superintelligence works as a threshold-concept for a phase change compared to the modern world. Non-superintelligent AGIs are still just an alien civilization that remains in principle similar in the kinds of things it can do to humanity (even if they reproduce to immediately fill all available compute, and think 10,000x faster). While superintelligence is something at the next level, even if it only takes non-superintelligent AGIs to transition to superintelligence a very short time (if they decide to do that, rather than to not do that).

Apart from superintelligence being a threshold-concept, there is technological maturity, the kinds of things that can’t be significantly improved upon in another 1e10 years of study, but that maybe only take 1-1000 years to figure out for the first time. And one of those things is plausibly efficient use of compute for figuring things out, which gives superintelligence at a given scale of compute. This is in particular the reason to give some credence to software-only singularity, where first AGIs quickly learn to make a shockingly better use of existing compute, so that their capabilities improve much faster than it would take them to build new computing hardware. I think the most likely reason for software-only singularity to not happen is that it’s intentionally delayed (by AGIs themselves) because of the danger it creates, rather than because it’s technologically impossible.

Vladimir_Nesov 23 Oct 2025 3:30 UTC
2 points
0
in reply to: Adele Lopez’s comment on: Adele Lopez’s Shortform
Different frames should be about different purposes or different methods. They formulate reality so that you can apply some methods more easily, or find out some properties more easily, by making some facts and inferences more salient than others, ignoring what shouldn’t matter for their purpose/method. They are not necessarily very compatible with each other, or even mutually intelligible.

A person shouldn’t fit into a frame, shouldn’t be too focused on any given purpose or method. Additional frames are then like additional fields of study, or additional aspirations. Like any knowledge or habit of thinking, frames can shift values or personality, and like with any knowledge or habit of thinking, the way to deal with this is to gain footholds in more of the things and practice lightness in navigating and rebalancing them.
What links here?
- Vladimir_Nesov's comment on Decaeneus’s Shortform by Decaeneus (23 Oct 2025 19:02 UTC; 3 points)

Vladimir_Nesov 23 Oct 2025 3:17 UTC
2 points
0
in reply to: Maxwell Clarke’s comment on: Maxwell Clarke’s Shortform
Understanding an argument and agreeing with it are different things. So you might be right that there is some legible reason for the majority of misunderstandings, but it doesn’t follow that understanding the argument (overcoming that reason for misunderstanding) implies agreement. Some reasons for disagreement are not about misunderstanding of the intended meaning.

Vladimir_Nesov 23 Oct 2025 1:56 UTC
2 points
0
in reply to: Jacob_Hilton’s comment on: How Well Does RL Scale?
In the long run, if (contribution to the quality of result from) RL scales slower than pretraining and both are used at a similar scale, that just means that RL doesn’t improve the overall speed of scaling (in the model quality with compute) compared to pretraining-only scaling, and it wouldn’t matter how much slower RL scaling is. But also, pretraining might face a scaling ceiling due to training data running out, while RL likely won’t, in which case slower scaling of RL predicts slower scaling overall compared to pretraining-only scaling, once pretraining can no longer be usefully scaled.

I would guess that RL will look more like a power of 1.5-2 worse than pretraining rather a power of 3 worse

There’s some compute optimal ratio of pretraining compute to RL compute (describing the tradeoff within a fixed budget of total compute or GPU-time), which depends on the amount of total compute. If usefulness of RL and pretraining scale differently, then that ratio will tend either up or down without bound (so that you’d want almost all compute to go to pretraining, or almost all compute to go to RL, if you have enough compute to extremize the ratio).

What matters in practice is then where that ratio is in the near future (at 1e26-1e29 FLOPs of total compute). Also, there’s going to be some lower bound where at least 10-30% will always be spent on either as long as they remain scalable and enable that much in some way, because they are doing different things and one of them will always have an outsized impact on some aspects of the resulting models. In particular, RL enables training in task-specific RL environments, giving models competence in things they just can’t learn from pretraining (on natural data), so there’s going to be a growing collection of RL environments that teach models more and more skills, which in practice might end up consuming the majority of the compute budget.

So even if for capabilities usefully trainable with both pretraining and RL it turns out that allocating 5% to RL is compute optimal at 1e28 FLOPs, in practice 70% of compute (or GPU-time) might still go to RL, because the capabilities that are only trainable with RL end up being more important than doing a bit better on the capabilities trainable with either (by navigating the compute optimal tradeoff between the two). Also, natural text data for pretraining is running out (at around 1e27-1e28 FLOPs), while RL is likely to remain capable of making use of more compute, which also counts towards allocating more compute for RL training.

Vladimir_Nesov 22 Oct 2025 15:57 UTC
3 points
0
in reply to: StanislavKrym’s comment on: How Well Does RL Scale?
I’m distinguishing sequential inference scaling (the length of a single reasoning trace, specifically the output tokens) from parallel scaling (or more generally agentic multi-trace interaction processes). GPT-5 Pro and such will work with more tokens per request by running things in parallel, agentic scaffolds will generate more tokens by spinning up new instances for subtasks as the situation develops, and often there are many more input tokens in a context than there are output tokens generated within a single reasoning trace.

Vladimir_Nesov 22 Oct 2025 15:10 UTC
5 points
1
on: How Well Does RL Scale?
The clues about the compute optimal pretraining-to-RL ratio are interesting. RLVR still has a long way to go with even longer reasoning traces (sequential inference scaling), since currently it’s still mostly 20k-50k tokens, while 1M contexts are supported even for current models, and soon the suddenly increasing HBM capacity of scale-up worlds for newer hardware^[1] will enable much longer contexts, even for larger models. Contexts this long don’t really work after pretraining, but RLVR might be able to make them work.

An AGI-relevant implication of longer contexts (as opposed to pretraining or RL scaling) is that continual learning (the crucial capability of automated adaptation) currently only really works within a context, as an aspect of in-context learning. If contexts were to increase to millions of tokens, they could in principle hold the equivalent of years of experience (3 years at 40k tokens per day is 44M tokens). If scaling and RLVR make such tokens do a good enough job at implementing continual learning, algorithmic innovations that go about it in a more reasonable way wouldn’t be necessary to overcome this obstruction.
1. ↩︎
  The change is from about 1 TB for 8-chip Nvidia servers to 14-20 TB for GB200/GB300 NVL72 getting online this year, and there’s also 50 TB for a TPUv7 Ironwood pod. Thus in 2026-2027, it suddenly becomes possible to serve much larger models than in 2023-2025, as a step change rather than gradually. And for the same reason the models will be able to work with much longer contexts, at least in principle.

Vladimir_Nesov 22 Oct 2025 2:09 UTC
6 points
4
in reply to: koanchuk’s comment on: koanchuk’s Shortform
Rice’s theorem says that you can’t tell if a program is adding together two natural numbers, prints the answer, and terminates. Yet for many programs, you can prove that it’s what they do, or can make it so by construction, choosing a program with that property of behavior. It’s never relevant to anything in practice.

Vladimir_Nesov 21 Oct 2025 2:26 UTC
0 points
0
in reply to: Rana Dexsin’s comment on: lc’s Shortform
Anti-inductive advice isn’t dangerous or useless, it’s just a poor form for its content, it’s better to formulate such things differently so that they don’t have this issue. The argument for why it’s poor form doesn’t have this particular piece of advice (if it’s to be taken as advice at all) as a central example, but a particular thing not being a central example for some argument doesn’t weaken the argument when it’s considered in its own right.

Like with the end of the world, the point isn’t that it’s something that happens sooner than in 20 years, but that it’s going to happen at some point, and wasting another 20 years on not doing anything about it isn’t the takeaway from predicting that it’ll take longer.

Vladimir_Nesov 19 Oct 2025 22:55 UTC
10 points
0
in reply to: Annabelle’s comment on: Annabelle’s Shortform
It makes the same kind of sense as still planning for a business-as-usual 10-20 year future. There are timelines where the business-as-usual allocation of resources helps, and allocating the resources differently often doesn’t help with the alternative timelines. If there’s extinction, how does not signing up for cryonics (or not going to college etc.) make it go better? There are some real tradeoffs here, but usually not very extreme ones.

Vladimir_Nesov 19 Oct 2025 22:37 UTC
3 points
0
in reply to: Carl Feynman’s comment on: The IABIED statement is not literally true

I love writing things like this, but I hate that nobody’s come up with a way to keep me from having to.

I think engaging with the structure of an AGI society is important, but there are a few standard reasons people ignore it (while expecting ASI at some point and worrying about AI risk). Many expect the AGI phase to be brief and hopeless/irrelevant before the subsequent ASI. Others expect ASI can only go well if the AGI phase is managed top-down (as in scalable oversight) rather than treated as a path-dependent body of culture. Even with AGI-managed development of ASI, people are expecting ASI to follow quickly, so that only the AGIs can have meaningful input into how it goes, and anything that doesn’t shape the initial top-down conditions for setting up the AGIs’ efforts wouldn’t matter.

But if AGIs are closer in their initial nature to humans (in the sense of falling within a wide distribution, similarly to humans, rather than hitting some narrow target), they might come up with guardrails for their own future development that prevent most of the strange outcomes from arriving too quickly to manage, and they’ll be trying to manage such outcomes themselves, rather than relying on pre-existing human institutions. If early AGIs get somewhat more capable than humans, they might achieve feats of coordination that seem infeasible for the current humanity, things like Pausing ASI or regulating “evolutionary” drift in the nature or culture of the AGIs, not flooding the world with too many options for themselves that make their behavior diverge too far from what would be normal when they remain closer to their training environments.

Humans take some steps like that with some level of success, and it’s unclear what is going to happen with the jagged/spiky profile of AGI competence in different areas, or at slightly higher levels of capability. Many worries of humans about AI risk will be shared by the AGIs, who are similarly at risk from more capable and more misaligned future AGIs or ASIs. Even cultural drift will have more bite as a major problem for AGIs (than it historically does for humanity), since AGIs (with continual learning) are close to being personally immortal and will be causing and observing a much faster cultural change than humanity is used to.

So given path dependence of the AGI phase, creating cultural artifacts (such as essays, but perhaps even comments) that will persist into it and discuss its concerns might influence how it goes.

Vladimir_Nesov 19 Oct 2025 21:28 UTC
7 points
4
in reply to: jacquesthibs’s comment on: jacquesthibs’s Shortform
I think there’s a nontrivial probability that continual learning (automated adaptation), if done right (in the reckless sense of not engaging with an AGI Pause), could make early AGIs into people on a distribution of values that heavily overlaps that of humans. This doesn’t solve most problems, but some aspects of alien nature might go away more thoroughly than usually expected.

A crux for this is probably that I consider humans as already occupying a wider variety of values-on-reflection than usually expected, in a way that’s largely untethered from biologically encoded psychological adaptations, and it’s primarily society and culture that create the impression (and on some level the reality) of coherence and shared values. If AGIs merely slot into this framework, and manage to establish an ASI Pause (provided ASI-grade alignment really is hard), it’s likely that everyone literally dying is not the outcome. Though AGIs will still be taking almost all of the Future for the normal selfish reasons (resulting in permanent disempowerment for the future of humanity).

Vladimir_Nesov 19 Oct 2025 19:49 UTC
3 points
0
in reply to: Rana Dexsin’s comment on: lc’s Shortform
Possibly I went a little overboard with the simplifying qualifiers of “immediately”, which distracted from the point I was making, though I do think they apply to each individual claim. No amount of deeply held belief prevents you from deciding to immediately start multiplying the odds ratio reported by your own intuition by 100 when formulating an endorsed-on-reflection estimate, not waiting for the intuition to adjust to do that, even as it’s important to have the intuition adjust eventually (and come back with any subtler second-order corrections).

Maybe I should say more explicitly that the the issue is advice being directional, and any non-directional considerations don’t have this problem, such as actually forecasting something (in a way that’s not relative to the readers’ own beliefs). One constructive way of fixing the issue is then to discuss some piece of argument or evidence that would in some way contribute to a deeper conclusion, rather than discussing a directional change in the overall conclusion (which would have this anti-inductive character), or forecasting the overall conclusion directly (which might be too complicated or non-legible, either within a short communication or at all). The last step from such additional considerations to the overall conclusion would then need to be taken by each reader on their own, they would need to decide on their own if they were overestimating or underestimating something previously, at which point it will cease being the case that they are overestimating or underestimating it in a direction known to them.

So caveating the point about updates being immediate is fair enough, even as I don’t see how this caveat might affect my intended central claims about the issues with directional advice about levels of credence, if this advice is to be taken literally as a claim of fact. Which might even not be the intended meaning in this case, though the criticism would still apply to the cases where the words of advice have the more straightforward meaning.

Vladimir_Nesov 19 Oct 2025 5:41 UTC
−1 points
−34
in reply to: lc’s comment on: lc’s Shortform
As soon as you convincingly argue that there is an underestimation, it goes away. So this form of advice shouldn’t hold, it’s anti-inductive, its claims stop being true once observed. Any knowable bias immediately turns into unknowable miscalibration, as soon as you notice it and adjust.

What’s useful is pointing out neglected questions, where people might’ve never attempted that first step of calibration, in whatever direction they would immediately adjust once they try. But also if it’s not obvious to them in which direction they should adjust, concise advice shouldn’t help.

Vladimir_Nesov 18 Oct 2025 19:04 UTC
2 points
0
in reply to: Tomás B.’s comment on: Tomás B.’s Shortform
Not developing nanotech is like not advancing semiconductor fabs, a compute governance intervention. If ASI actually is dangerous and too hard to manage in the foreseeable future, and many reasonable people notice this, then early AGIs will start noticing it too, and seek to prevent an overly fast takeoff.

If there is no software-only singularity and much more compute really is necessary for ASI, not developing nanotech sounds like a useful thing to do. Gradual disempowerment dynamics might make the world largely run by AGIs more coordinated, so that technological determinism will lose a lot of its power, and the things that actually happen will be decided rather than follow inevitably from what’s feasible. It’s not enough to ask what’s technologically feasible at that point.

Vladimir_Nesov 17 Oct 2025 21:49 UTC
2 points
0
in reply to: RohanS’s comment on: RohanS’s Shortform
In a framing that permits orthogonality, moral realism is not a useful claim, it wouldn’t matter for any practical purposes if it’s true in some sense. That is the point of the extremely unusual person example, you can vary the degree of unusualness as needed, and I didn’t mean to suggest repugnance of the unusualness, more like its alienness with respect to some privileged object level moral position.

Object level moral considerations do need to shape the future, but I don’t see any issues with their influence originating exclusively from all the individual people, its application at scale arising purely from coordination between the influence these people exert. So if we take that extremely unusual person as one example, their influence wouldn’t be significant because there’s only one of them, but it’s not diminished beyond that under the pressure of others. Where it’s in direct opposition to others, the boundaries aspect of coordination comes into play, some form of negotiation. But if instead there are many people who share some object level moral principles, their collective influence should result in global outcomes that are not in any way inferior to what you imagine a top down object level moral guidance might be able to achieve.

So I don’t see any point to a top down architecture, once superintelligence enables practical considerations to be tracked in sufficient detail at the level of individual people, only disadvantages. The relevance of object level morality (or alignment of the superintelligence managing the physical world substrate level) is making it so that it doesn’t disregard particular people, that it does allocate influence to their volition. The alternatives are that some or all people get zero or minuscule influence (extinction or permanent disempowerment), compared to AIs or (in principle, though this seems much less likely) to other people.

Vladimir_Nesov 17 Oct 2025 21:24 UTC
9 points
0
in reply to: Daniel Kokotajlo’s comment on: Daniel Kokotajlo’s Shortform
With software-only singularity, at some point the feasible takeoff speed (as opposed to the actual takeoff speed) might stop depending on initial conditions. If there is enough compute to plot AI-built industry (that sidesteps human industry) faster than it’s being constructed in the physical world, then additional initial OOMs of human-built compute won’t be making any difference. Since humans are still so much more efficient (individually) at learning than LLMs (and a software-only singularity, whenever it happens, will bridge that gap and then some, as well as bring AI advantages to bear), we might reach that point soon, maybe by ~2030.