I can’t find any existing objection to this, including from actual philosophers
What would you say about the line from Catechism: “For the Son of God became man so that we might become God”, which I take as growing in capabilities and alignment?
I can’t find any existing objection to this, including from actual philosophers
What would you say about the line from Catechism: “For the Son of God became man so that we might become God”, which I take as growing in capabilities and alignment?
Yes, I would be very interested in reading such a blog. I am also interested in a more pragmatic question of how actual alien life could have become intelligent and created a civilisation, for which there could be relevant posts like Cannell’s Brain Efficiency (which I assume to rule out bigger neural nets on the ground of energy-based constraints), a nonexistent post on scaling laws of brains depending on size and time lived (in a manner similar to Claudes’ trend (or logarithm thereof?!) over time?) and another nonexistent post on how a civilisation would find it HARD to emerge before specific conditions caused the Universe to create habitable planets and to let life actually grow there.
advocating for regulating AI on the basis of harms like this is bad, will eventually backfire, and should be modeled as a cost, not as a benefit, from the perspective of predicting how useful his actions will be on AI safety topics.
My main crux is the societal awareness[1] around capabilities of existing and future AIs and robots. I once remarked that laypeople do NOT foresee AI taking over the world because they fail to understand[2] the true extent of AIs’ transformative potential to the point of believing that the AIs have zero reasoning abilities.
A layperson who believed such nonsense would also believe that the AIs are at most toys capable of doing things like driving people into psychosis or destroying people’s relationships, but not of wiping mankind out and sustaining the civilisation. Therefore, I don’t understand how a safetyist can win the 2026 election while not making Bores-like factual errors.
UPD: on March 14 I remarked that the IABIED march had gathered 898 pledgers and 1630 people who signed to be notified of IABIED’s activities. By June 19 these numbers reached only 1105 and 1991.
UPD: A similar point was made by Boaz Barak on the fourth fake graph.
Examples include Freddie de Boer’s attempt to drive home the point that the AIs are unlikely to acquire such capabilities, Casey Simpson’s video full of factual errors or a video released in MAY 2026 whose author claims that AI is a technology with no lived experience and zero reasoning abilities. The latter video was released AFTER GPT-5.4 Pro solved an Erdös problem, but before the triumph of an unreleased OpenAI model.
I would spread untrustworthiness to open-weight companies which operate at far lower security standards. And that’s ignoring the threat of rogue replication which open-sourced models find easier to enact...
I wonder on which sense Yudkowsky’s first interpretation was bad. When his long list of reasons for AGI to be deadly was reevaluated in 2026, the most load-bearing crux from the point 25 stayed unsolved and Yudkowsky’s other disproven ideas didn’t seem enough to super-reliably avert the disaster.
Thank you! I always wanted such a site to exist so that we could track Chinese progress as opposed to American one. Additionally, I’d like to add the ability to track the progress of untrustworthy companies like xAI as opposed to that of trustworthy ones.
If GLM 5.2 is as performant as GPT-5.5, then the AI-2027-like race is looming. I hope that @Zvi provides more details...
Increased secrecy seems dangerous, but highly capable open weight models are perhaps more so?
This reminds me of Alvin Anestrand’s Rogue Replication scenario...
It seems to me that math and physics are fundamentally different. Our understanding of physics doesn’t rule out the existence of parallel worlds with a different value of G or even the possibility that the true value of G differs from what we believe by, say, 1E-100 N/kg^2*m^2, but it does fully exclude parallel worlds with a different value of π.
This reminds me of that thread and discussion therein. Additionally, I don’t think that it’s easy to explain why people should be educated towards Bayesian rationality specifically, and not, say, merely sciences or logic. On the other hand, teaching people to care about the environment has an easy-to-convince effect.
P.S. How could one teach people to reason in Bayesian ways if there is a crisis of basic literacy?
I wonder what was the first society which fully lost the evolutionary pressure for higher intelligence, like having smarter kids earn higher status and have a bigger chance to increase their IGF by receiving more resources and raising more kids (or outright, um, gaining the ability to force multiple women to give birth; see also the case of Ginghis Khan and the Y-chomosome inherited by his descendants). Evolutionary pressure might have, at best, been overshadowed by timelines being short.
Additionally, the “maximum intelligence evolution can produce” could be due to scaling laws of neural net intelligence and efficiency with which these nets can receive energy necessary for computations. A counterfactual species having humanlike brains and living for 600 years instead of less than a hundred could have its individuals become more intelligent after 200 years of life than the humans currently do after 30 years.
Isn’t the optimal strategy to build a hideout on Mars untouched by human nukes?
As far as I understand, superbabies would be important if, as Yudkowsky believes, SOTA mankind is unlikely to solve alignment because “humans are not at the level of intelligence where thinking they have a solution strongly correlates with them actually having a solution.”
Yudkowsky-Soares’ longer quote
Humanity often gains its knowledge by struggling, and trying, and failing, and slowly accumulating knowledge. But it doesn’t have to be that way.
Einstein was not only able to figure out general relativity; he was able to figure it out by thinking hard about the problem, even before humanity put satellites in orbit and started seeing discrepancies in their clocks with their own two eyes (as discussed in Chapter 6). He had empirical evidence, but he was able to efficiently pinpoint the right answer in response to the first quiet whispers from the empirical record, rather than needing the truth to come banging at his door.
That pathway is rarer and harder to walk, but that kind of scientific genius does exist — albeit rarely, even among the world’s best and brightest.
Humans augmented one or two steps beyond the level of researchers like Einstein or John von Neumann might begin to accurately figure out their own flaws, and correct for them, in dozens of different ways.
They might notice when they were rationalizing or falling victim to confirmation bias. They might go past the point of ever expecting a clever-sounding idea to work when actually it does not work — to the point where whenever they expect to succeed, they do succeed. They might achieve a level of competence where they still make plenty of mistakes, but they aren’t systematically overconfident (or underconfident) in tricky new domains.
Is human intelligence enhancement really a possibility? It seems so to us, having spoken with a number of biotech researchers who think that there are promising near-term angles of attack. Carefully targeted biotech-focused AI might also help accelerate the work. But from our perspective, it remains very uncertain whether a plan like this would realistically pan out. What we feel more confident in saying is that it’s a highly leveraged option that deserves a lot more investment and exploration than it’s currently getting.
We are not recommending enhancing human intelligence as the only post-AI-shutdown strategy we think humanity should heavily invest in. Rather, this is just one of many examples, and the one we currently think holds the most promise. We strongly recommend that humanity look into multiple possible non-AI paths forward, rather than putting all its eggs in one basket.
The main problems which I see with similar arguments are the following:
Mankind saw GPT-5.4 Pro and an internal OpenAI model solve two Erdos problems by applying an unnatural combination of pre-existing discoveries. How likely is it that Einstein stood on the shoulders of giants like Riemann (who studied mathematical notions like the geometry of curved spaces) or Minkowsky (who also studied an abstract Minkowski space)?
Humans becoming superintelligent could face a potential severe tension with scaling laws of neural net efficiency;
I have a conjecture that a big chunk of Yudkowsky’s reasoning requires rewriting, yielding something like “Human values are contingent… on the very features that allowed human brains to become transformative” or “Squirrelly algorithms and superstimuli regularly appear in neural nets, but aren’t THAT immune to moral reflection”.
I think that investments in education also have longer feedback loops. Suppose that someone in EA invested into elementary schools working with at most 12-year-old kids in 2026 only for an ASI to commit genocide of mankind in 2030. Then kids affected by these investments would be at most 16 years old and would be unlikely to generate any value to the society. Similarly, if someone invested into opening a pedagogic college in 2000, then the first cohort of teachers would start working in schools in 2004. If one of these teachers entered lower elementary schools, then kids taught by such a teacher wouldn’t enter the workforce until 2010 or even 2014, if we are takling about college-educated workforce.
Liberal democracies seem to be much more immune to reward hacking, at least at the grand-strategy level.
I wonder if the entire issue of who exactly would win the contest for mankind’s CEV is politicized to hell, as Yudkowsky described.
First of all, right-wing people and those who don’t live in liberal democracies are willing to cite various perfectly real trends like the decline of the West’s share of the world’s parity-rebalanced GDP, the share of production of goods in the USA’s GDP or education levels (think of Gen Alpha being unable to read) as evidence that liberal democracies have currently also fallen prey to other forms of hacking. The more radical version of such a thesis is the idea that an aligned institution cannot be built out of severely misaligned people.
Secondly, I doubt that one can conduct an empirical test and isolate the potential contribution of liberal democracy as opposed to, say, the remnants of Christianity (no, seriously, I have encountered such arguments!) or of colonialism which elevated Europe and the USA. I suspect that the empirical test would require tracing through billions of simulated lives or research on alien civilisations waiting to be formed.
Could you explain why it is softpedaling with ONE throwaway line on loss of control? Amodei called for this:
Amodei on audits of AI systems
However, now the risks are clearly here. It is time to go beyond transparency to more serious and binding regulation of AI. I believe the best analogy, at least at the current stage of the exponential, is to cars, airplanes, or drugs—powerful technologies essential to the modern economy, but capable of killing large numbers of people if designed or operated poorly. I therefore believe we should model AI regulation on agencies like the Federal Aviation Administration (FAA). Frontier AI models, like airplanes, should be required to go through technical testing and auditing, and their release should be blocked or reversed as a threat to public safety if they do not meet high standards of safety. I am grateful to see the Trump administration’s Executive Order move incrementally towards a greater role for government in AI, though Anthropic’s proposal recommends even further action. Our proposal includes the following elements:
Models above a threshold of compute should undergo mandatory testing by a qualified third party for their level of risk in four specific areas: cybersecurity, biological weapons, loss of control of AI systems, and automated R&D that could accelerate these other risks.
The government should have the power to block or deter deployment of the model if it is determined, in light of third-party assessment, to present unacceptable risks. This power must be scoped to the above four specific risks and there must be protective measures against political favoritism or arbitrary decisions.
Third-party evaluation could be done by a government agency (similar to the FAA) or a set of private organizations that are authorized and inspected by the government to evaluate models according to certain standards (a “regulatory markets” approach).
AI companies that develop advanced AI models must have strong security standards that protect their model weights, should conduct regular red teaming and penetration testing, and should work with the government to defend against major threat actors.
Safety incidents in the four critical areas must be reported promptly.
What did Amodei miss, except for the ability of internally deployed models to follow Agent-4′s path from AI-2027?
Per my quick take, I would appreciate it if you also tested my conjecture of the post-o3 slowdown. Additionally, the estimates made by the AI-2027 authors have a 80% time horizon change not as a result of reaching a point in time, but by reaching a certain horizon like a working month or a working year.
adding such complexity to a theory makes it far less useful to actually model human behavior, both on normative and descriptive levels.
Where did Yudkowsky or anyone else say that the FDT was supposed to model human behavior? It is to prescribe behaviors which I expect to be similar to ethical ones, like “Don’t loot the other universe even if it’s inhabited only by a paperclip optimizer”.
and new roadmaps for solving them
GPT-5.4 Pro: Hold my beer...
Unfortunately, such a plan conflicts with Wikipedia’s goals. It is supposed to neutrally describe whatever noteworthy topics as represented in Reliable Sources and not to be a propaganda tool.