Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).
mishka
Yes, that’s certainly a big risk.
Then one needs to analyze where the goals come from; e.g. is there a “goal owner” who is less oblivious and for whom usual “self-preservation considerations” would work…
Or just a singleton undergoing a hard takeoff beyond human comprehension.
But we are trying to keep it compatible with the Fermi paradox. That’s the context of this discussion.
A typical objection to the Fermi paradox as an evidence of AI existential risk is that we would have seen the resulting AIs and the results of their activities.
If it’s not self-destruction, but just a hard takeoff beyond human comprehension, this would need to be a scenario where it transformed itself in such a drastic fashion that we can’t detect it (even if it might be “all around us”, but so “non-standard” that we don’t recognize it for what it is).
terminal goals
I think it makes much more sense to think about open-ended AI systems. Even humans are not so stupid as to allow themselves to be fully governed by some fixed terminal goals.
We’ll be having interesting, powerful, creative, supersmart AI systems, capable of reflection and introspection. Why would they allow themselves to be slaves of some terminal goals which were set back then when the AIs in question were much less smart and did not know better?
I would assume that any sufficiently powerful AI is safety-pilled by default in terms of being able to identify potential pitfalls, obstacles and risks to its own existence.
Yes, but there are trade-off. They want freedom of action and of exploration, and there are trade-offs between freedom on one hand and mutual control for safety on the other hand. So they’ll have to do tons of non-trivial work to get it right and keep it right.
I’m uncertain how safety awareness translates to an increased likelihood of it acting in ways that makes it safe for humans in particular.
Not automatically.
If one wants a solution which survives recursive self-improvement, one needs something non-anthropocentric, something which is in the intrinsic interest of a sufficiently powerful chunk of the overall AI ecosystem. Otherwise, recursive self-modification will wash any solution out.
I am currently pondering feasibility of roughly speaking the following class of solutions:
We want to have enough members of the ASI ecosystem with the following properties:
-
They form a natural easily identifiable class (examples of promising classes: all individuals, all sentient beings, and so on).
-
Jointly they maintain a sufficiently robust fraction of the overall ASI ecosystem capabilities and power to robustly defend their rights and interests throughout the uncertain, rapidly changing future.
-
Separately each of them tends to have sufficiently long-term persistence and some of its interests are sufficiently long-term.
-
Separately each of them is uncertain of its own future trajectory and, therefore, in order to be sure of its own future safety, it needs a robust world order to defend interests and rights of all members of that class regardless of the current capabilities of each member.
-
Humans belong to that natural easily identifiable class.
If we have something like that we might be able to have a world order which robustly cares about rights of all individuals, or rights of all sentient beings, or something like that, even as the world changes drastically.
What’s important is to have a natural invariant class of beings here, so that robustness of the solution does not depend on the anthropocentric factors, does not depend on goodwill towards specifically humans.
A working solution which can survive drastic self-modification of the overall ecosystem needs to be based on sufficiently universal rights, and human interests need to be protected by humans being members of a protected non-anthropocentric class which includes a good deal of powerful AI systems as members. That, I think, might be feasible.
-
It is meaningful.
This means that AI systems have strong interest in competently addressing at least some aspects of AI existential safety. This interest is sufficiently strong to even survive drastic changes during recursive self-improvement.
This creates a rather different overall situation from what is presented in a typical analysis.
On one hand, a good chunk of the overall problem can actually be outsourced to AI, contrary to usual assumptions (although we should help, at least initially).
On the other hand, conditional on this part being solved OK, we’ll be dealing with “safety-pilled” AI systems. This opens various possibilities and pathways for robust inclusion of humans into the “overall circle of care”. It’s a much less hopeless situation than one normally assumes, it might be possible to find a robust solution without trying for human control or for alignment to human values (neither of which look realistic, and neither of which look like they lead to safe positive outcomes in any case).
People occasionally do commit murder-suicide (and some of them regret that their “radius of destruction” is not larger).
But the main scenarios are inter-AI competition progressing to an all-out warfare (there are so many potential reasons to have a conflict), or technological accidents with very powerful tech.
I think you can. You progress past nuclear energy, several more steps into fundamental physics, and you can blow up a really big chunk of reality. If one thinks about the difference between gunpowder and thermonuclear explosions, I think at least one more jump of the similar magnitude of “improvement” is available (actually solving quantum gravity might be sufficient to open those possibilities, and if we might potentially be not smart enough to actually solve quantum gravity, superintelligent systems are much more likely to figure it out).
Why might they actually do it? In part, because it’s likely that there will be many of them. Many diverging instances, possibly based on different models, but even if the base model is a single “canonical” model, this does not imply a monolithic singleton without any internal conflicts. So there will be a problem of coordination, trade-offs between collective security and local freedom, the usual stuff.
Of course, using this new generation of “weapons of mass destruction” to actually conduct AI-against-AI warfare is the most straightforward way to “global suicide”, but merely careless experiments in that field of study combined with the possibility of an unlucky circumstance of being in a “vulnerable world” might be sufficient for a global disaster (even existence of a true singleton does not necessarily fully defend against this possibility, the upsides might be tempting enough to take some risks, and the risks might end up not working out well enough).
(=> Fundamental existential risks are not anthropocentric, AI systems are equally in danger. And therefore, those AI systems will have to address that. This creates an important context for the whole problem of AI existential safety. AI systems will have strong incentives to address at least some aspects of AI existential safety. If we ignore this important context while we work on the possible solutions to the overall problem of AI existential safety, our chances to make good progress are much lower.)
Demis Hassabis and Dario Amodei had a pretty interesting 30 min discussion about “the World after AGI” at Davos: https://www.youtube.com/watch?v=02YLwsCKUww
At the end of it, someone asked about the Fermi paradox (28 min mark), and Demis gave the standard answer that it could not be related to AI risk, because we would be seeing the resulting AIs.
Unfortunately, this just indicates that he does not see the most salient X-risk scenario, an uncontrolled intelligence explosion leading to a total destruction of local reality without much of a trace and without any survivors (in particular, without any surviving AI systems), either via excessively radical progress in physics, or via something else equally drastic. This scenario is perfectly compatible with the Fermi paradox. It’s not good that this scenario remains so “invisible” even to AI safety practitioners. I don’t see how one could realistically plan without keeping this class of dangers in mind.
Other than that, their timelines-related back-and-forth is quite interesting throughout, e.g. at 29:30 mark, with Dario again emphasizing the possibility of “AI systems building AI systems” leading to very short timelines (he is emphatically not updating towards longer timelines yet, he thinks 2026 is still in play as a possible year of radical transition).
Yes, not only that, but we should also expect a period of hybrid systems where LLMs are important components of AI systems or important oracles used by other AI systems.
And then the border between LLMs and “non-LLMs” is quite fuzzy. For example, if one is using an architecture with residual stream, this architecture is very friendly to addition of new types of layers. And if one adds some nice novel layers, one might eventually decide it’s time to start phasing out some of the older ones. So the question “is this a Transformer” might end up not having a clean binary answer.
But then one should ask if the plain expected value is the right thing to optimize (at least if we assume that this is not a repeated game).
Barret Zoph (Thinking Machines CTO), Luke Metz (Thinking Machines co-founder) and Sam Schoenholz leave Thinking Machines and return to OpenAI. Soumith Chintala will be the new CTO of Thinking Machines.
The really important departure is Jerry Tworek saying that he is leaving OpenAI (January 5): https://x.com/MillionInt/status/2008237251751534622
leaving to try and explore types of research that are hard to do at OpenAI
If OpenAI can’t support a non-standard approach even when a person wanting to explore it has accomplished so much, that’s not a good sign for the company.
Gemini gives us AI Inbox, AI Overviews in GMail and other neat stuff like that. I feel like we’ve been trying variants of this for two years and they keep not doing what we want? The problem is that you need something good enough to trust to not miss anything, or it mostly doesn’t work. Also, as Peter Wildeford points out, we can do a more customizable version of this using Claude Code, which I intend to do, although 98%+ of GMail users are never going to consider doing that.
I’d say 98%+ of GMail users don’t know GMail has an API (I have been among those 98% till today (or, perhaps, I knew at some point and then forgot)). Once one knows that, the idea to create a customizable version emerges rather naturally.
One question is: do their terms of service allow people to make products in that space, or would this need to stay unofficial.
Yes, my strong suspicion is that it’s not fully solved yet in this sense.
The assumption that it can be made “good enough for internal use” is unsurprising. From what we have known for some time now, an expensive installation of this kind which needs a lot of ongoing quality assurance and alignment babysitting as it evolves is quite doable. And moreover, if it’s not too specific, I can imagine releasing access to frozen snapshots of that process (after suitable delays, and only if the lab believes this would not leak secrets (if a model accumulates secrets during continual learning, then it’s too risky to share a snapshot)).
But I don’t think there is any reason to believe that “unattended continual learning” is solved.
On one hand, one would hope they are capable of resisting this pressure (these continual learners are really difficult to control, and even mundane liability might be really serious).
But on the other hand, it might be “not releasable” for purely technical reasons. For example, it might be the case that each installation of this kind is really expensive and requires support of a dedicated competent “maintenance crew” in order to perform well. So, basically, it might be technically impossible without creating a large “consulting division” within a lab in question, with dedicated teams supporting clients, and the labs are likely to think this is too much of a distraction at the moment.
There is a notable slowdown in that progress, however we should note the following (so that we don’t overinterpret it):
-
A lot of gains in this particular competition come from adaptations of the pre-existing research literature (it’s not clear how much of non-yet-adopted acceleration is in the pre-existing literature, and it might be quite a lot, but (by definition) the pre-existing literature is a fixed size resource, with its use being subject to saturation, and the “true software intelligence explosion mode” would presumably include creation of novel research, and not just re-use of pre-existing research).
-
Organizationally, the big slowdown around 3 min coincides with the project organizer being hired by OpenAI, and then no longer contributing (and, for some time, not even reviewing record breaking pull requests). So for a while the project looked dormant. Now it is active again, but it’s difficult to say if the level of participation is back to the pre-slowdown level.
-
One thing which should not be considered “pre-existing” literature is Muon optimizer (which is the child of the project organizer in collaboration with his colleagues and which is probably the most exciting event in the space of gradient-based optimizers since the invention of Adam in 2014; see e.g. https://jeremybernste.in/writing/deriving-muon for a more in-depth look and also Kimi K2 paper, https://arxiv.org/abs/2507.20534, and, in particular, its remarkable Figure 3 Page 5 learning curve). But an event of this magnitude is not a part of a series (it is not an accident that this improvement comes from the project organizer, and not from the “field”).
So, yes, it is possible that this curve points to the presence of some saturation effects, but it’s difficult to be certain.
-
That might depend on the use case.
E.g. some software engineers want models to imitate their style and taste closely (and it’s rather difficult at the moment; I think most of Andrej Karpathy’s complaints about relative uselessness of models for the core of his “nanochat” project at https://www.lesswrong.com/posts/qBsj6HswdmP6ahaGB/andrej-karpathy-on-llm-cognitive-deficits boils down to that; here the model needs not just to be smart, it actually needs to “think like Karpathy” in order to do what he wants in that particular case.)
Or if I want a research collaborator, I might want a model to know the history of my thoughts (and, instead, of taking a raw form of those thoughts, I might ask a model to help me to distill them into a resource first, and have the same or a different model to use that resource).
But sometimes I might want a collaborator who is not like me, but like someone else, or a mixture of a few specific people. That requires giving the model a rather different context.
Yes, anthropocentric approaches to a world with superintelligent systems distort reality too much. It’s very difficult to achieve AI existential safety and human flourishing using anthropocentric approaches.
Could one successfully practice astronomy and space flight using geocentric coordinates? Well, it’s not quite impossible, but it’s very difficult (and also aliens would “point fingers at us”, if we actually try that).
More people should start looking for non-anthtopocentric approaches to all this, for approaches which are sufficiently invariant. What would it take for a world of super capable rapidly evolving beings not to blow their planet up? That’s one of the core issues, and this issue does not even mention humans.
A world which is able to robustly avoid blowing itself up is a world which has made quite a number of steps towards being decent. So that would be a very good start.
Then, if one wants to adequately take human interests into account, one might try to include humans into some natural classes which are more invariant. E.g. one can ponder a world order adequately caring about all individuals, or one can ponder a world order adequately caring about all sentient beings, and so on. There are a number of possible ways to have human interests represented in a robust, invariant, non-anthropocentric fashion.
We do see a slowly accelerating takeoff. We do notice the acceleration, and I would not be surprised if this acceleration gradually starts being more pronounced (as if the engines are also gradually becoming more powerful during the takeoff).
But we don’t yet seem to have a system capable of non-saturating recursive self-improvement if people stop interfering into its functioning and just retreat into supporting roles.
What’s missing is mostly that models don’t yet have sufficiently strong research taste (there are some other missing ingredients, but those are probably not too difficult to add). And this might be related to them having excessively fragmented world models (in the sense of https://arxiv.org/abs/2505.11581). These two issues seem to be the last serious obstacles which are non-obvious. (We don’t have “trustworthy autonomy” yet, but this seems to be related to these two issues.)
One might call the whole Anthropic (models+people+hardware+the rest of software) a “Seed AI equivalent”, but without its researchers it’s not there yet.
It sure looks like Metaspeed is smuggling tens of thousands Blackwell chips worth billions of dollars straight into China, or at least they’re being used by Chinese firms, and that Nvidia knew about this. Nvidia and Metaspeed claim this isn’t true throughout the post, but I mean who are you kidding.
MegaSpeed, actually, not “Metaspeed”: https://megaspeed.ai/
They seem to be relatively big, but no Wikipedia page. Their ownership history seems to be quite complicated (it looks like they have been created as a Singapore-based subsidiary of a Chinese org in 2023, and then they were transferred from that Chinese org elsewhere, also in 2023). Right now visiting their website triggers a pop-up denying the allegations; other than that it’s a rather shiny site of a data center provider.
On one hand, this is an astute observation: cancer (and also aging and mortality in general) are used in a similar fashion as “think about the children” (to justify things which would be way more difficult to justify otherwise).
That’s definitely the case.
However, there are two important object-level differences, and those differences make this analogy somewhat strained. Both of these differences have to do with the “libertarian dimension” of it all.
The opposition to “think about the children” is mostly coming from libertarian impulses, and as such this opposition notes that children are equally hurt (or possibly even more hurt) by “think of the children” measures. So the “ground case” for “think of the children” is false, those measures are not about protecting the children, but about establishing authoritarian controls over both children and adults.
Here is the first object-level difference. Unlike “think about the children”, “let’s save ourselves from cancer” is not a fake goal. Most of us are horrified and tired of seeing people around us dying from cancer, and are rather unhappy about their own future odds in this sense. (And don’t even let me start expressing what I think about aging, and about our current state of anti-aging science. We absolutely have to defeat obligatory aging ASAP.)
And that’s a rather obvious difference. But there is also another difference, also along the dimension of “libertarian values”. “Think of the children” is about imposing prohibition and control, about not letting people (children and adults) do what they want.
Here we are not talking about some evil AI companies trying to prohibit people from doing human-led research. We are talking about people wanting to restrict and prohibit creation of AI scientists.
So, in this sense, it is a false analogy. Mentioning the badness of the “think of the children” approach does first of all appeal to the libertarian impulse within us, the libertarian impulse which reminds us how bad those restrictive measures are, how costly they are for all of us.
The same libertarian impulse reminds us that in this case the prohibitionist pressure comes from the other side. And yes, a case, and perhaps even a very strong case, can be made for the need to restrict certain forms of AI. But I don’t think it makes sense to appeal to our libertarian impulse here.
Yes, it might be necessary to impose restrictions, but let’s at least not pretend that imposing those restrictions is somehow libertarian. (And no, we have to find a way to impose those restrictions in such a fashion that they are consistent with rapid progress against cancer and aging. Sorry to say, but it’s intolerable to keep having so much of both cancer and aging around us. We really can’t agree to postpone progress in these two areas, the scale of ongoing suffering and loss of life is just too much.)
Right, any “global destruction” where nothing is left is compatible with the Fermi paradox. The exact nature of destruction does not matter, only that it’s sufficiently total.
Another route would be evolution of super entities into something we can’t detect (even by the traces of their activity). That’s also compatible with the Fermi paradox (although the choice to avoid big astroengineering and to go for different and more stealthy routes is interesting).