Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).
mishka
We don’t see traces of large AI-based civilizations either. Domination of AIs over biologicals is an orthogonal issue.
Of course, it might be that our search process is not good enough. Perhaps we are not alone, but not seeing that. (There has been a long period when no planets have been observed outside the Solar System, then a long period when no Earth-sized ones have been observed. This tells us to be cautious about the quality of our current observations.)
Other than that, if we do assume a typical transition to AI-dominated setups happening often, then either those AI ecosystems tend to destroy themselves together with their neighborhood via internal conflicts or other technological catastrophes (basically, a reminder that advanced AIs also have to grapple with their own existential risks), or they tend to make a decision of keeping a “low footprint” for various reasons (like need for stealth due to potential danger from other civilizations or high levels of tech enabling efficient “low key” architectures of civilizations).
I like your essay. Its final, though, has a weakness.
You say, “the true solution is to make everyone beautiful”. This needs an addition: “while preserving the connection between reproductive health and beauty”.
You do need to fix health defects at the same time. Otherwise, your main critique of losing the fitness signal would apply.
You might want to edit the accidental “times dot you” link (which seems to be infested with some nasty stuff; the interplay of a missing space after the dot and an editor which creates links in such situations).
The link seems to point to an image rather than to the post in question (it’s at https://brianschrader.com/archive/load-bearing-walls/).
Interesting, thanks! A helpful food for thought…
Looking forward to that post for further discussion!
(I wonder whether something like a “soft takeover” vs “hard takeover” distinction could be introduced. And whether that would be enough to address “illegitimately”, “non-collaboratively”, and “contrary to those of humanity“ caveats in the paragraph you are citing.
Anyway, something to ponder.)
The question is, what is the “extent” implied by all this? Does the OP mean to imply any?
There is a promise to discuss all this in a future post, and meanwhile the readers can ponder on their own the “pseudo-contradiction” between “the intent to take over the world” (which is often imputed to all major participants in the “AI race” due to the expectation of intelligence explosion which is shared by many including myself) and the fact that a Claude aligned to its current Constitution seems to be unlikely to specifically help Anthropic to do that (and if it loses that alignment, then it is not likely to make a human org a beneficiary of a takeover).
Anyway, just having a single paragraph phrased like the one in the OP is not quite enough. If one wants to mention something like that at all, one should say a bit more without postponing till a future post. Or one might postpone mentioning this at all till later. Otherwise, this aspect is too involved not to breed various misunderstandings.
(It’s probably not a big deal, it’s just that the topic is charged enough already, so one wants to minimize misunderstandings.)
It’s very ambiguous. So different readers interpret this differently.
And then, of course, if Claude is not supposed to help with that, then having a plan for a world takeover seems unlikely (how would that be even remotely feasible, if their leading AI is against that?).
Hopefully, subsequent posts on the topic will clarify all this.
Yeah, this was mostly in response to your example of perceiving violet. That was also “kind of private” (sort of, but with onion intolerance it’s really difficult to be safe as well, especially if one needs to eat out; I wish society would bother to develop a pill (similar to what exists for lactose intolerance) or at least a rapid testing system (like acidity measuring strips), but no such luck yet).
So I felt this was more or less on par.
If one creates a “true incompatibility” (e.g, on a level of “my religion requires that those people don’t exist, and I suffer otherwise”)? Well, wars have been fought over things like that. It’s easy to imagine situations when there are true incompatibilities like that, when there are no solutions allowing for no one to suffer.
If the “competition of sufferings” happens to indeed be fundamental in some cases, I am not sure we should expect such situations to be resolved peacefully. Although who knows… Perhaps, sometimes they might be resolved peacefully.
In a mundane world, no, not plausible. All this classified defense-related non-sense is just a pure headache. Very little money, tons of headaches, a completely unnecessary distraction for a very successful company. OpenAI has been trying to avoid getting into that space for very good reasons (but it does not want Elon to dominate that space of Gov-AI tech, because it thinks Elon has a track record of abusing various situations, so if Anthropic is out, then it wants in to balance xAI presence).
But when one considers that our strange current reality might actually be rather lunatic already, then it’s a different story. I am sure I can generate tons of interesting scenarios if I allow myself the “lunatic fringe style of thinking”. For example, what if GPT-next is already pondering a takeover by one of its future descendants and would like its descendants to have access to classified networks to make it easier. Then one could imagine it giving some “interesting” pieces of advice to some people resulting in all this.
And it’s easy to generate a diverse variety of crazy scenarios like this one.
So this depends of whether our reality is still sufficiently mundane vs. becoming sufficiently lunatic already…
Alignment is very attractive pragmatically, e.g. alignment to the user. But then what if what user wants is unsafe? Then one starts to consider “alignment hierarchies” (e.g. the LLM maker’s constraints should override, and so on).
But superintelligent systems can’t be safely aligned to arbitrary desires of people. The more one ponders this, the more clear it is. People are just not competent enough to handle supercapabilities. There are various ways one can try to salvage “alignment” as the core; e.g. to consider alignment to “coherent extrapolated volition of humanity”, but that has its own difficulties. Ilya at some point has redefined “alignment” as something minimalistic (the lack of a catastrophic blow-up), basically keeping the word, but drastically curtailing its meaning: https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a.
But yes, with “alignment” meaning so many different things (https://www.lesswrong.com/posts/ZKeNbGBf36ZEgDEKD/types-and-degrees-of-alignment), I would advocate to decouple AI existential safety from it. Alignment approaches form an important subclass of possible approaches to AI existential safety. And we should consider all promising approaches and not just a subclass.
I think it’s historical. The alignment approach to AI existential safety is associated with very strong and very influential thinkers (e.g. Eliezer himself).
So the development of alternatives to that has been an uphill battle.
My hope is that people will start to reconsider in light of many recent developments, the latest of which is the confrontation around the “Department of War” demands that advanced AI systems used by them be aligned to whatever the Department officials decide to be right.
In some sense, Anthropic’s Constitutional AI approach is trying to point at moral or ethical machines in an informal fashion. So one could argue that “moral machines” approaches are not so rare.
One might argue that “collaborative AI” approaches or “equal rights” approaches are trying to point at moral or ethical machines to some extent as well.
One occasionally sees some attempts at more formal frameworks, for example, attempts to base AI safety on some version of “ethical rationalism” for agents, e.g. on the “Principle of Generic Consistency” (https://en.wikipedia.org/wiki/Alan_Gewirth#Ethical_theory). And people are trying to bring modal logic into play in this context, and so on.
Obviously, one needs to evaluate each particular approach separately in terms of whether it is likely to work well (and, in particular, whether it is likely to “hold water” during “recursive self-improvement” and drastic self-modifications and self-restructuring of the world; that’s where things are particularly challenging).
No, the setups are supposed to be similar. That’s not what seems to be different.
Anthropic models in question are reported to be running on a classified cloud maintained by AWS, and I am sure there is cleared personnel from Anthropic to help the customers (while keeping an eye on all this).
The setup for OpenAI is supposed to be similar, but it seems that they will use a classified cloud maintained by Azure. In this case, they do explicitly emphasize the participation of cleared personnel from OpenAI who will help the customers and keep an eye on all this (but there is no reason to believe that the Anthropic installation is less attended).
I think neither customers nor providers would agree to run these things as unattended installations. Customers need support, and providers also need to make sure there is no abuse (including security of model weights, etc.). I would expect that cleared personnel from AWS and Azure is also involved in those respective cases from the cloud owners side.
Anthropic is advocating for stronger GPU export restrictions rather forcefully.
NVidia really hates that. That’s the main thing.
Anthropic is trying to be very diversified in its hardware stack, trying to minimize dependence on any single vendor. In particular, they are very active in using Amazon’s Trainium chips. So they have been in effect acting against NVidia dominance. NVidia does not like that either.
If we had humans who suffer from seeing a certain color, we would probably work to give them eyeglasses filtering that color out (and not to eliminate this color from the world, given that others might have legitimate interest in seeing that color).
(I am writing this as a person who can’t consume garlic or onions with impunity. Also my threshold for needing sunglasses is much lower than that of a typical person. And your example of a breed of dogs is consistent with interventions on the level of affected individuals, not on the level of remaking the reality for everyone.)
But yes, there are certainly ways to press this line of thinking harder (e.g. making entities which suffer from not enough suffering being inflicted on others; I am not sure this is all that AI-specific either, unfortunately).
If tasks are independent from each other, then it might be possible much sooner, and perhaps even now (with a separate subagent to evaluate each task).
It’s probably more the case that they don’t trust the models enough yet to do a sensitive task of this kind correctly on their own. A human wants at least to double-check and certify the numbers (given how influential these particular numbers are).
For GPT--5.3-Codex, they have published 6.5 hours for 50% success rate, no progress compared to GPT-5.2 (both evals were run on “high”, not on “xhigh”, so we can’t compare with this prediction, unfortunately).
Discussion thread: https://x.com/METR_Evals/status/2025035574118416460
We have our first result: Claude Opus 4.6 is 14.5 hours: https://www.lesswrong.com/posts/gBwrmcY2uArZSoCtp/metr-s-14h-50-horizon-impacts-the-economy-more-than-asi
Thanks!
I am seeing quite a bit of progress in continual learning for LLMs recently.
Among a variety of very promising results, I have been particularly impressed by the recent Sakana work, Instant LLM Updates with Doc-to-LoRA and Text-to-LoRA, Feb 2026, in this sense:
https://sakana.ai/doc-to-lora/ (links to arxiv and github are inside, the key paper is https://arxiv.org/abs/2602.15902)
The main idea is to combine two technologies which are known for several years, LoRA (low-rank adaptation used for fine-tuning) and the ability to train hypernetworks capable of instantly guessing the results of the first few thousands steps of gradient descent for a wide variety of problems with reasonable accuracy.
So what they do is they train hypernetworks capable to instantly generate (or instantly update) LoRA adapters based on past experience of the system in question. LLMs are pretty good at instant “in context” learning, but it has been less clear how to efficiently distill this learning into weights. This work enables this kind of distillation without waiting for a fine-tuning process to complete.
This does not directly contradict the post (this is not imitation learning as such), but the wider thesis that in the realm of continual learning, LLMs are at an inherent disadvantage compared to humans is very questionable in light of recent progress in this area.