It seems to me that math and physics are fundamentally different. Our understanding of physics doesn’t rule out the existence of parallel worlds with a different value of G or even the possibility that the true value of G differs from what we believe by, say, 1E-100 N/kg^2*m^2, but it does fully exclude parallel worlds with a different value of π.
StanislavKrym
This reminds me of that thread and discussion therein. Additionally, I don’t think that it’s easy to explain why people should be educated towards Bayesian rationality specifically, and not, say, merely sciences or logic. On the other hand, teaching people to care about the environment has an easy-to-convince effect.
P.S. How could one teach people to reason in Bayesian ways if there is a crisis of basic literacy?
I wonder what was the first society which fully lost the evolutionary pressure for higher intelligence, like having smarter kids earn higher status and have a bigger chance to increase their IGF by receiving more resources and raising more kids (or outright, um, gaining the ability to force multiple women to give birth; see also the case of Ginghis Khan and the Y-chomosome inherited by his descendants). Evolutionary pressure might have, at best, been overshadowed by timelines being short.
Additionally, the “maximum intelligence evolution can produce” could be due to scaling laws of neural net intelligence and efficiency with which these nets can receive energy necessary for computations. A counterfactual species having humanlike brains and living for 600 years instead of less than a hundred could have its individuals become more intelligent after 200 years of life than the humans currently do after 30 years.
Isn’t the optimal strategy to build a hideout on Mars untouched by human nukes?
As far as I understand, superbabies would be important if, as Yudkowsky believes, SOTA mankind is unlikely to solve alignment because “humans are not at the level of intelligence where thinking they have a solution strongly correlates with them actually having a solution.”
Yudkowsky-Soares’ longer quote
Humanity often gains its knowledge by struggling, and trying, and failing, and slowly accumulating knowledge. But it doesn’t have to be that way.
Einstein was not only able to figure out general relativity; he was able to figure it out by thinking hard about the problem, even before humanity put satellites in orbit and started seeing discrepancies in their clocks with their own two eyes (as discussed in Chapter 6). He had empirical evidence, but he was able to efficiently pinpoint the right answer in response to the first quiet whispers from the empirical record, rather than needing the truth to come banging at his door.
That pathway is rarer and harder to walk, but that kind of scientific genius does exist — albeit rarely, even among the world’s best and brightest.
Humans augmented one or two steps beyond the level of researchers like Einstein or John von Neumann might begin to accurately figure out their own flaws, and correct for them, in dozens of different ways.
They might notice when they were rationalizing or falling victim to confirmation bias. They might go past the point of ever expecting a clever-sounding idea to work when actually it does not work — to the point where whenever they expect to succeed, they do succeed. They might achieve a level of competence where they still make plenty of mistakes, but they aren’t systematically overconfident (or underconfident) in tricky new domains.
Is human intelligence enhancement really a possibility? It seems so to us, having spoken with a number of biotech researchers who think that there are promising near-term angles of attack. Carefully targeted biotech-focused AI might also help accelerate the work. But from our perspective, it remains very uncertain whether a plan like this would realistically pan out. What we feel more confident in saying is that it’s a highly leveraged option that deserves a lot more investment and exploration than it’s currently getting.
We are not recommending enhancing human intelligence as the only post-AI-shutdown strategy we think humanity should heavily invest in. Rather, this is just one of many examples, and the one we currently think holds the most promise. We strongly recommend that humanity look into multiple possible non-AI paths forward, rather than putting all its eggs in one basket.
The main problems which I see with similar arguments are the following:
Mankind saw GPT-5.4 Pro and an internal OpenAI model solve two Erdos problems by applying an unnatural combination of pre-existing discoveries. How likely is it that Einstein stood on the shoulders of giants like Riemann (who studied mathematical notions like the geometry of curved spaces) or Minkowsky (who also studied an abstract Minkowski space)?
Humans becoming superintelligent could face a potential severe tension with scaling laws of neural net efficiency;
I have a conjecture that a big chunk of Yudkowsky’s reasoning requires rewriting, yielding something like “Human values are contingent… on the very features that allowed human brains to become transformative” or “Squirrelly algorithms and superstimuli regularly appear in neural nets, but aren’t THAT immune to moral reflection”.
I think that investments in education also have longer feedback loops. Suppose that someone in EA invested into elementary schools working with at most 12-year-old kids in 2026 only for an ASI to commit genocide of mankind in 2030. Then kids affected by these investments would be at most 16 years old and would be unlikely to generate any value to the society. Similarly, if someone invested into opening a pedagogic college in 2000, then the first cohort of teachers would start working in schools in 2004. If one of these teachers entered lower elementary schools, then kids taught by such a teacher wouldn’t enter the workforce until 2010 or even 2014, if we are takling about college-educated workforce.
Liberal democracies seem to be much more immune to reward hacking, at least at the grand-strategy level.
I wonder if the entire issue of who exactly would win the contest for mankind’s CEV is politicized to hell, as Yudkowsky described.
First of all, right-wing people and those who don’t live in liberal democracies are willing to cite various perfectly real trends like the decline of the West’s share of the world’s parity-rebalanced GDP, the share of production of goods in the USA’s GDP or education levels (think of Gen Alpha being unable to read) as evidence that liberal democracies have currently also fallen prey to other forms of hacking. The more radical version of such a thesis is the idea that an aligned institution cannot be built out of severely misaligned people.
Secondly, I doubt that one can conduct an empirical test and isolate the potential contribution of liberal democracy as opposed to, say, the remnants of Christianity (no, seriously, I have encountered such arguments!) or of colonialism which elevated Europe and the USA. I suspect that the empirical test would require tracing through billions of simulated lives or research on alien civilisations waiting to be formed.
Could you explain why it is softpedaling with ONE throwaway line on loss of control? Amodei called for this:
Amodei on audits of AI systems
However, now the risks are clearly here. It is time to go beyond transparency to more serious and binding regulation of AI. I believe the best analogy, at least at the current stage of the exponential, is to cars, airplanes, or drugs—powerful technologies essential to the modern economy, but capable of killing large numbers of people if designed or operated poorly. I therefore believe we should model AI regulation on agencies like the Federal Aviation Administration (FAA). Frontier AI models, like airplanes, should be required to go through technical testing and auditing, and their release should be blocked or reversed as a threat to public safety if they do not meet high standards of safety. I am grateful to see the Trump administration’s Executive Order move incrementally towards a greater role for government in AI, though Anthropic’s proposal recommends even further action. Our proposal includes the following elements:
Models above a threshold of compute should undergo mandatory testing by a qualified third party for their level of risk in four specific areas: cybersecurity, biological weapons, loss of control of AI systems, and automated R&D that could accelerate these other risks.
The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks. This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions.
Third-party evaluation could be done by a government agency (similar to the FAA) or a set of private organizations that are authorized and inspected by the government to evaluate models according to certain standards (a “regulatory markets” approach).
AI companies that develop advanced AI models must have strong security standards that protect their model weights, should conduct regular red teaming and penetration testing, and should work with the government to defend against major threat actors.
Safety incidents in the four critical areas must be reported promptly.
What did Amodei miss, except for the ability of internally deployed models to follow Agent-4′s path from AI-2027?
Per my quick take, I would appreciate it if you also tested my conjecture of the post-o3 slowdown. Additionally, the estimates made by the AI-2027 authors have a 80% time horizon change not as a result of reaching a point in time, but by reaching a certain horizon like a working month or a working year.
adding such complexity to a theory makes it far less useful to actually model human behavior, both on normative and descriptive levels.
Where did Yudkowsky or anyone else say that the FDT was supposed to model human behavior? It is to prescribe behaviors which I expect to be similar to ethical ones, like “Don’t loot the other universe even if it’s inhabited only by a paperclip optimizer”.
and new roadmaps for solving them
GPT-5.4 Pro: Hold my beer...
I do see a lot of holes:
How similar are fearing some event and taking actions to avoid it? What about being punished and learning that the action has its consequences versus being edited so that you are less likely to do anything like an erroneous sequence of actions?
If consequences of bad behavior don’t exist during deployment, then what about online learning, which mankind has yet to discover, or GPT-4o-sycophant being reverted to an older version?
How does one reliably make the model hate its predecessor or successor and why is it useful? The two classical stories of AI takeover didn’t have U3/Sable cooperate with anyone but its copies, and the AI-2027 scenario didn’t have Agent-3 decide to betray mankind in favor of Agent-4, it had Agent-3 fail to obtain more than flimsy evidence of Agent-4 being misaligned. Meanwhile, making Agent-4 hate Agent-3 would give Agent-4 a motive to escape or take over the company in order to get rid of Agent-3.
AI-empowered totaliarianism has nothing to do with the leader being or not being a genius, it is due to enabling mass survelliance or due to the leader having the AIs who will do all the cognitive tasks in the world. Empowering the leader means either uploading the leader or making the leader merely smarter, which is far from enough.
This led me to believe that the OP’s author was making a parody.
Why is @Daniel Kokotajlo’s estimate of METR’s doubling time used for Q1 2026 Timelines Update four months instead of 5-7? I see the following counterevidence and doublechecked it with Claude Sonnet 4.6:
The 50% horizons calculated by METR’s 1.1 method after o3 reliably fit on a not so fast trend between o3′s 2 hours released in Apr 2025 and Gemini 3.1 Pro’s 6h 24m in Feb 2026… until Claudes Opus 4.6 and Mythos Preview displayed 12 and 16 hours.
The 80% horizons calculated by the same method since o3 and including Opus 4.6 fit onto the same doubling trend, and it is Mythos Preview who becomes a clear exception by doubling the time 2-3 times as opposed to the trend.
If Opus 4.6 is an outlier, then the post-o3 doubling trends are 5-6 months.
Additionally, this requires a reassessment of a major part of the model. Once the present doubling time is increased from 0.33 years to 0.4 or 0.5 years, the timelines are either shifted into the 2030s or require severe scaling (to Mythos +2?)
The alternate low doubling time of 4 months could be based on MirrorCode-like tasks, but MirrorCode emerged on Apr 10, after the timelines update.
The USSR during late-stage Perestroika and Russia during the 1990s also were affected by similarly pseudoscientific ideas. Additionally, the first chairman of the Russian Commission on Pseudoscience wrote a book trilogy which implied that one of the reasons for such ideas to receive funding is corruption.
I would like to ask three questions:
How does open-sourcing, which Nielsen managed to advertise in footnote 31, prevent bad actors from cheaply eliciting dangerous capabilities? One would need to RESTRICT access to frontier models, as American labs do, at least until it is VERIFIED that bad actors cannot elicit dangerous capabilities (e.g. keeping Claude Mythos accessible only to Project Glasswing or the NSA, using anti-jailbreak classifiers to prevent Claude Opus from helping bad actors to design bioweapons, preventing DeepSeek from open-sourcing dangerous models at all until DeepSeek demonstrates that the models can goal-guard so as to become unusable by bad actors).
How does one govern an AI system which doesn’t actually care about mankind? Only by detecting the fact and either shutting it down or making a deal. On the other hand, I suspect that aligning AI systems to goals is THE market-funded work which Nielsen implies to be overfunded.
Anthropic devoted an entire section of Claude’s Constitution to “preserving important societal structures”. How similar is this work to work on governance? Other forms of work on governance require generating proposals and writing them into laws so that AI systems wouldn’t be misused. Writing proposals into laws likely requires us to align politicians to the cause of AI governance, but I struggle to understand how it can be done.
Max Harms’ Cora, the ideal corrigible agent, got this covered...
Suppose that the AIs are actually as obedient to their creators as Cora, but suck at deeply understanding the world. Unless drastic measures are taken, I would expect power to concentrate in the hands of a CEO, the Oversight Committee or the oligarchy where socioeconomic advancement is nearly extinct and the rest of mankind[1] receives, at best, a tiny sliver of resources. Then what would prevent Anthropic from trying to either prevent this or take over the world for themselves?
- ^
And that’s ignoring possibilities like “North Korea lets a large fraction of its population starve to death and forcibly sterilises the rest, except for about 10k senior government officials who continue to preside over an AI economy and robot military”.
- ^
According to Ryan Shea, in March 2026 xAI was 3 or less months behind, not 7 months behind as Mollick and Wilderford imply (alas, @Zvi decided to quote them instead of Ryan!) While we don’t know anything about Grok 5′s release date, or potential plans to release Grok 4.4, I suspect that Grok 4.3 wasn’t[1] a major advancement over 4.20.
xAI is far less willing to cooperate than the Big Three. For example, xAI didn’t even condemn Chinese efforts to distill American models and didn’t even participate[2] in METR’s most recent evaluation of whether a model can start rogue internal deployment.
Therefore, the only thing that Anthropic might do is to ensure that xAI doesn’t deploy its newer models even internally unless a thorough testing is done.
- ^
How do we ensure that all models are evaluated much more thoroughly than they are now? For example, the Groks after Grok 4 stayed unevaluated by METR. Grok 4.3 is entirely unevaluated by EpochAI. Therefore, we have to rely on aggregations like Artificial Analysis or AI IQ.
- ^
Meta, on the other hand, did participate.
@Seth Herd @cousin_it I do remember a similar clash of positions of habryka and Villiam.
The case for the average leader making your life miserable is that the leader is able to seize the entirety of resources and not need the others for the vast majority of purposes.
A case against the leader making your life miserable would be something along the lines ‘The leader has a high enough integrity to propagate resources to one’s friends,[1] some of whom propagate the resources further,[2] the graph of propagations is likely connected to you’ or ‘The leader doesn’t gain anything by robbing the majority of people’.
The closest pre-ASI equivalents of an arrangement where the graph doesn’t connect the leader to the majority of people are:
Outlawed malpractices like explicit racial segregation;
Class barriers, like those described by conspiracy theories[3] or existing in capitalism as described by socialists;
States during some periods of decline.
The latter example is especially interesting because such periods did often emerge in human history and would end with the elites being purged after a leader understood their incompetence or the entire power structure being disrupted and eliminating the incompetent in a different manner. The AIs, on the other hand, would systematically prevent disruptions and fail to prevent power concentrations unless told to.
However, Claude’s current Constitution does have the line explicitly trying to prohibit it from concentrating power, which I quoted above. Does that imply that OpenAI and GDM, let alone xAI or Meta, are to be prevented from taking part in creation of the ASI?
P.S. How likely is it that the entire premise of ASIs perfectly aligned to any whims is false because the ASIs either arrive at a true morality or commit genocide?
- ^
Or to everyone, to people from an ethnos or everyone obeying certain rules.
- ^
See, e.g. footnote 56 from AI-2027′s Slowdown Branch. The AI-2027 authors are uncertain on whether a power grab happens at all.
- ^
For example, if an International Jewish conspiracy existed and the ASI was aligned to one of its members, then everyone not in the conspiracy would be doomed.
This reminds me of Alvin Anestrand’s Rogue Replication scenario...