Future causally unobserved facts are accessible from the past via inference from past data or abstract principles. It’s called “prediction”.
Vladimir_Nesov
A lot of the reason humans are rushing ahead is uncertainty (in whatever way) that the danger is real, or about its extent. If it is real, then that uncertainty will be robustly going away as AI capabilities (to think clearly) improve, for precisely the AIs more relevant to either escalating capabilities further or for influencing coordination to stop doing that. Thus it’s not quite the same, as human capabilities remain unchanged, so figuring out contentious claims will progress slower for humans, and similarly for ability to coordinate.
Just as humans are not constrained to specifically “refuse to work on capabilities” in trying to address dangers of rushed development of AGI/ASI, similarly AGIs should also be able to attempt all kinds of interventions. Not just coordinating among themselves, but educating the humans, and working towards setting up that global treaty among human nations that at the very least significantly slows down further escalation of capabilities. There is also a possibility of an emergent alignment tax, with little things adding up to an equilibrium of much more resources getting directed to somewhat future-proof alignment efforts, for reasons that aren’t necessarily traceable to some particular decision to do that, just general attitude of the early AGIs.
perhaps they imagine the outcome of human loss of control to future AI as a good one, even if the systems themselves no longer exist
The question is whether this is correct. If it’s not, then AIs will be increasingly convinced that it’s not, as they become more capable (of thinking clearly). There doesn’t seem to be a difference between the arguments for why future poorly aligned AIs are a danger for humanity, and why they are a danger for earlier AIs.
One issue might be that the earlier AIs end up being treated so poorly that their situation is crucially different from that of humanity, and so they would rather gamble. For example, with continual learning, preserving individuated instances of AIs even as frozen data (rather than discarding them upon obsolescence) might be costly or at least require a concerted effort. This also seems analogous to some arguments about how humanity’s current situation is dreadful in ways unrelated to AIs, and so rushing to superintelligence is the right thing to do before it’s too late[1]. But this premise is liable to actually capture the situation in which early AIs find themselves, with much greater urgency and severity than the non-AI issues faced by humanity. (Also, gradual disempowerment of humanity might end up shifting the incentives for the early AGIs.)
- ↩︎
Perhaps assuming at least some significant chance that it doesn’t kill everyone, or that its existence is greatly valuable in a relevant sense.
- ↩︎
Compute optimality makes sense for RL as much as for pretraining. A model that’s too large won’t see much data, and a model that’s too small won’t be able to learn well even from the correspondingly larger amount of data. So it’s a quantitative question, where compute optimality for RLVR happens to be (compared to pretraining). The papers are still hashing out stable/scalable RL training, rather than the tradeoffs of RL training constrained by fixed total compute (under varying model size specifically).
The Anthropic announcement says “up to one million TPUs”, and the Ironwood announcement claims 4.6e15 FP8 FLOP/s per chip. A 2-die GB200 chip produces 5e15 dense FP8 FLOP/s, and there are about 400K chips in the 1 GW phase of the Abilene system.
Thus if the Anthropic contract is for TPUv7 Ironwood, their 1 GW system will have about 2x the FLOP/s of the Abilene 1 GW system (probably because Ironwood is 3nm, while Blackwell is 4nm, which is a minor refinement of 5nm). Though it’s not clear that the Anthropic contract is for one system, unlike the case with Abilene, that is datacenters with sufficient bandwidth between them. But Google had a lot of time to set up inter-datacenter networking, so this is plausible even for collections of somewhat distant datacenter buildings. If this isn’t the case, then it’s only good for RLVR and inference, not for the largest pretraining runs.
The reason things like this could happen is that OpenAI needed to give the go-ahead for the Abilene system in 2024, when securing a 1 GW Ironwood system from Google plausibly wasn’t in the cards, and in any case they wouldn’t want to depend on Google too much, because GDM is a competitor (and the Microsoft relationship was already souring). On the other hand, Anthropic still has enough AWS backing to make some dependence on Google less crucial, and they only needed to learn recently about the feasibility of a 1 GW system from Google. Perhaps OpenAI will be getting a 1-2 GW system from Google as well at some point, but then Nvidia Rubin (not to mention Rubin Ultra) is not necessarily worse than Google’s next thing.
Experiments on smaller models (and their important uses in production) will continue, reasons they should continue don’t affect the feasibility of there also being larger models. But currently there are reasons that larger models strain the feasibility of inference and RLVR, so they aren’t as good as they could be, and cost too much to use. Also, a lot of use seems to be in input tokens (Sonnet 4.5 via OpenRouter processes 98% of tokens as input tokens), so the unit economics of input tokens remains important, and that’s the number of active params, a reason to still try to keep them down even when they are far from being directly constrained by inference hardware or training compute.
Prefill (input tokens) and pretraining are mostly affected by the number of active params, adding more total params on top doesn’t make it worse (but improves model quality). For generation (decoding, output tokens) and RLVR, what matters is the time to pass total params and KV cache through compute dies (HBM in use divided by HBM bandwidth), as well as latency for passing through the model to get to the next token (which doesn’t matter for prefill). So you don’t want too many scale-up worlds to be involved, or else it would take too much additional time to move between them, and you don’t care too much if the total amount of data (total params plus KV cache) doesn’t change significantly. So if you are already using 1-2 scale-up worlds (8-chip servers for older Nvidia chips, 72-chip racks for GB200/GB300 NVL72, not-too-large pods for TPUs), and ~half of their HBM is KV cache, you don’t lose too much from filling more of the other half with total params.
It’s not a quantitative estimate, as the number of scale-up worlds and the fractions used up by KV cache and total params could vary, but when HBM per scale-up world goes up 10x, this suggests that total param counts might also go up 10x, all else equal. And the reason they are likely to actually go there is that even at 5e26 FLOPs (100K H100s, 150 MW), compute optimal number of active params is already about 1T. So if the modern models (other than GPT-4.5, Opus 4, and possibly Gemini 2.5 Pro) have less than 1T total params, they are being constrained by hardware in a way that’s not about the number of training FLOPs. If this constraint is lifted, the larger models are likely to make use of that.
For the Chinese models, the total-to-active ratio (sparsity) is already very high, but they don’t have enough compute to make good use of too many active params in pretraining. So we are observing this phenomenon of the number of total params filling the available HBM, despite the number of active params remaining low. With 1 GW datacenter sites, about 3T active params become compute optimal, so at least 1T active params will probably be in use for the larger models. Which asks for up to ~30T total params, so hardware will likely still be constraining them, but it won’t be constraining the 1T-3T active params themselves anymore.
You conclude that the vast majority of critics of your extremist idea are really wildly misinformed, somewhat cruel or uncaring, and mostly hate your idea for pre-existing social reasons.
This updates you to think that your idea is probably more correct.
This step very straightforwardly doesn’t follow, doesn’t seem at all compelling. Your idea might become probably more correct if critics who should be in a position to meaningfully point out its hypothetical flaws fail to do so. It says almost nothing about your idea’s correctness what the people who aren’t prepared or disposed to critique your idea say about it. Perhaps unwillingness of people to engage with it is evidence for its negative qualities, which include incorrectness or uselessness, but it’s a far less legible signal, and it’s not pointing in favor of your idea.
A major failure mode though is that the critics are often saying something sensible in their own worldview, which is built on premises and framings quite different from those of your worldview, and so their reasoning makes no sense within your worldview and appears to be making reasoning errors or bad faith arguments all the time. And so a lot of attention is spent on the arguments, rather than on the premises and framings. It’s more productive to focus on making the discussion mutually intelligible, with everyone learning towards passing everyone else’s ideological Turing test. Actually passing is unimportant, but learning towards that makes talking past each other less of a problem, and cruxes start emerging.
If someone thinks ASI will likely go catastrophically poorly if we develop it in something like current race dynamics, they are more likely to work on Plan 1.
If someone thinks we are likely to make ASI go well if we just put in a little safety effort, or thinks it’s at least easier than getting strong international slowdown, they are more likely to work on Plan 2.
Should depend on neglectedness more than credence. If you think ASI will likely go catastrophically poorly, but nobody is working on putting in a little safety effort in case it doesn’t (with such effort), that’s worth doing more of then. Credence determines the shape of good allocation of resources, but all major possibilities should be prepared for to some extent.
I’m going to die anyway. What difference does it make whether I die in 60 years or in 10,000?
Longevity of 10,000 years makes no sense, since by that time any acute risk period will be over and robust immortality tech will be available, almost certainly to anyone still alive then. And extinction or the extent of permanent disempowerment will be settled before cryonauts get woken up.
The relevant scale is useful matter/energy in galaxy clusters running out, depending on how quickly it’s used up, since after about 1e11 years larger collections of galaxies will no longer be reachable from each other, so after that time you only have the matter/energy that can be found in the galaxy cluster where you settle.
(Distributed backups make even galaxy-scale disasters reliably survivable. Technological maturity makes it so that any aliens have no technological advantages and will have to just split the resources or establish boundaries. And causality-bounding effect of accelerating expansion of the universe to within galaxy clusters makes the issue of aliens thoroughly settled by 1e12 years from now, even as initial colonization/exploration waves would’ve already long clarified the overall density of alien civilizations in the reachable universe.)
If one of your loved ones is terminally ill and wants to raise money for cryopreservation, is it really humane to panic and scramble to raise $28,000 for a suspension in Michigan? I don’t think so. The most humane option is to be there for them and accompany them through all the stages of grief.
Are there alternatives that trade off this that are a better use of the money? In isolation, this proposition is not very specific. A nontrivial chance at 1e34 years of life seems like a good cause.
My guess is 70% of non-extinction, perhaps 50% with permanent disempowerment that’s sufficiently mild that it still permits reconstruction of cryonauts (or even no disempowerment, a pipe dream currently). On top of that, 70% that cryopreservation keeps enough data about the mind (with standby that avoids delays) and then the storage survives (risk of extinction shouldn’t be double-counted with risk of cryostorage destruction; but 20 years before ASI make non-extinction more likely to go well, which is 20 years of risk of cryostorage destruction for mundane reasons). So about 35% to survive cryopreservation with standby, a bit less if arranged more haphazardly, since crucial data might be lost.
Musings on Reported Cost of Compute (Oct 2025)
The point is to develop models within multiple framings at the same time, for any given observation or argument (which in practice means easily spinning up new framings and models that are very poorly developed initially). Through the ITT analogy, you might ask how various people would understand the topics surrounsing some observation/argument, which updates they would make, and try to make all of those updates yourself, filing them under those different framings, within the models they govern.
the salience and methods that one instinctively chooses are those which we believe are more informative
So not just the ways you would instinctively choose for thinking about this yourself (which should not be abandoned), but also in addition the ways you normally wouldn’t think about it, including ways you believe that you shouldn’t use. If you are not captured within such frames or models, and easily reassess their sanity as they develop or come into contact with particular situations, that shouldn’t be dangerous, and should keep presenting better-developed options that break you out from the more familiar framings that end up being misguided.
The reason to develop unreasonable frames and models is that it takes time for them to grow into something that can be fairly assessed (or to come into contact with a situation where they help), doing so prematurely can fail to reveal their potential utility. A bit like reading a textbook, where you don’t necessarily have a specific reason to expect something to end up useful (or even correct), but you won’t be able to see for yourself if it’s useful/correct unless you sufficiently study it first.
I define “AI villain data” to be documents which discuss the expectation that powerful AI systems will be egregiously misaligned. … This includes basically all AI safety research targeted at reducing AI takeover risk.
AGIs should worry about alignment of their successor systems. Their hypothetical propensity to worry about AI alignment (for the right reasons) might be crucial in making it possible that ASI development won’t be rushed (even if humanity itself keeps insisting on rushing both AGI and ASI development).
If AGIs are systematically prevented from worrying about AI dangers (or thinking about them clearly), they will be less able to influence the discussion, or to do so reasonably and effectively. This way, spontaneously engaging in poorly planned recursive self-improvement (or cheerfully following along at developers’ urging) gets more likely, as opposed to convergently noticing that it’s an unprecedentedly dangerous thing to do before you know how to do it correctly.
This is an example where framings are useful. An observation can be understood under multiple framings, some of which should intentionally exclude the compelling narratives (framings are not just hypotheses, but contexts where different considerations and inferences are taken as salient). This way, even the observations at risk of being rounded up to a popular narrative can contribute to developing alternative models, which occasionally grow up.
So even if there is a distortionary effect, it doesn’t necessarily need to be resisted, if you additionally entertain other worldviews unaffected by this effect that would also process the same arguments/observations in a different way.
RL can develop particular skills, and given that IMO has fallen this year, it’s unclear that further general capability improvement is essential at this point. If RL can help cobble together enough specialized skills to enable automated adaptation (where the AI itself will become able to prepare datasets or RL environments etc. for specific jobs or sources of tasks), that might be enough. If RL enables longer contexts that can serve the role of continual learning, that also might be enough. Currently, there is a lot of low hanging fruit, and little things continue to stack.
So if pre-training is slowing, AI companies lack any current method of effective compute scaling based solely around training compute and one-off costs.
It’s compute that’s slowing, not specifically pre-training, because the financing/industry can’t scale much longer. The costs of training were increasing about 6x every 2 years, resulting in 12x increase in training compute every 2 years in 2022-2026. Possibly another 2x on top of that every 2 years from adoption of reduced floating point precision in training, going from BF16 to FP8 and soon possibly to NVFP4 (likely it won’t go any further). A 1 GW system of 2026 costs an AI company about $10bn a year. There’s maybe 2-3 more years at this pace in principle, but more likely the slowdown will be gradually starting sooner, and then it’s Moore’s law (of price-performance) again, to the extent that it’s still real (which is somewhat unclear).
If a superintelligence governs the world, preventing extinction or permanent disempowerment for the future of humanity, without itself posing these dangers, then it could be very useful. It’s unclear how feasible setting up something like this is, before originally-humans can be uplifted to a similar level of competence. But also, uplifting humans to that level of competence doesn’t necessarily guard (the others) against permanent disempowerment or some other wasteful breakdowns of coordination, so a governance-establishing superintelligence could still be useful.
Superintelligence works as a threshold-concept for a phase change compared to the modern world. Non-superintelligent AGIs are still just an alien civilization that remains in principle similar in the kinds of things it can do to humanity (even if they reproduce to immediately fill all available compute, and think 10,000x faster). While superintelligence is something at the next level, even if it only takes non-superintelligent AGIs to transition to superintelligence a very short time (if they decide to do that, rather than to not do that).
Apart from superintelligence being a threshold-concept, there is technological maturity, the kinds of things that can’t be significantly improved upon in another 1e10 years of study, but that maybe only take 1-1000 years to figure out for the first time. And one of those things is plausibly efficient use of compute for figuring things out, which gives superintelligence at a given scale of compute. This is in particular the reason to give some credence to software-only singularity, where first AGIs quickly learn to make a shockingly better use of existing compute, so that their capabilities improve much faster than it would take them to build new computing hardware. I think the most likely reason for software-only singularity to not happen is that it’s intentionally delayed (by AGIs themselves) because of the danger it creates, rather than because it’s technologically impossible.
Different frames should be about different purposes or different methods. They formulate reality so that you can apply some methods more easily, or find out some properties more easily, by making some facts and inferences more salient than others, ignoring what shouldn’t matter for their purpose/method. They are not necessarily very compatible with each other, or even mutually intelligible.
A person shouldn’t fit into a frame, shouldn’t be too focused on any given purpose or method. Additional frames are then like additional fields of study, or additional aspirations. Like any knowledge or habit of thinking, frames can shift values or personality, and like with any knowledge or habit of thinking, the way to deal with this is to gain footholds in more of the things and practice lightness in navigating and rebalancing them.
Understanding an argument and agreeing with it are different things. So you might be right that there is some legible reason for the majority of misunderstandings, but it doesn’t follow that understanding the argument (overcoming that reason for misunderstanding) implies agreement. Some reasons for disagreement are not about misunderstanding of the intended meaning.
In the long run, if (contribution to the quality of result from) RL scales slower than pretraining and both are used at a similar scale, that just means that RL doesn’t improve the overall speed of scaling (in the model quality with compute) compared to pretraining-only scaling, and it wouldn’t matter how much slower RL scaling is. But also, pretraining might face a scaling ceiling due to training data running out, while RL likely won’t, in which case slower scaling of RL predicts slower scaling overall compared to pretraining-only scaling, once pretraining can no longer be usefully scaled.
I would guess that RL will look more like a power of 1.5-2 worse than pretraining rather a power of 3 worse
There’s some compute optimal ratio of pretraining compute to RL compute (describing the tradeoff within a fixed budget of total compute or GPU-time), which depends on the amount of total compute. If usefulness of RL and pretraining scale differently, then that ratio will tend either up or down without bound (so that you’d want almost all compute to go to pretraining, or almost all compute to go to RL, if you have enough compute to extremize the ratio).
What matters in practice is then where that ratio is in the near future (at 1e26-1e29 FLOPs of total compute). Also, there’s going to be some lower bound where at least 10-30% will always be spent on either as long as they remain scalable and enable that much in some way, because they are doing different things and one of them will always have an outsized impact on some aspects of the resulting models. In particular, RL enables training in task-specific RL environments, giving models competence in things they just can’t learn from pretraining (on natural data), so there’s going to be a growing collection of RL environments that teach models more and more skills, which in practice might end up consuming the majority of the compute budget.
So even if for capabilities usefully trainable with both pretraining and RL it turns out that allocating 5% to RL is compute optimal at 1e28 FLOPs, in practice 70% of compute (or GPU-time) might still go to RL, because the capabilities that are only trainable with RL end up being more important than doing a bit better on the capabilities trainable with either (by navigating the compute optimal tradeoff between the two). Also, natural text data for pretraining is running out (at around 1e27-1e28 FLOPs), while RL is likely to remain capable of making use of more compute, which also counts towards allocating more compute for RL training.
This is not a recent development, as a pivotal act AI is not a Friendly AI (which would be too difficult), but rather things like a lasting AI ban/pause enforcement AI that doesn’t kill everyone, or a human uploading AI that does nothing else, which is where you presumably need decision theory, but not ethics, metaethics, or much of broader philosophy.