Archimedes
Whoops. I meant “land animal” like my prior sentence.
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.
This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.
In rough terms:
Stockfish > LQO >> LQO (-queen) > strong humans > Stockfish (-queen)
Stockfish plays roughly like a minimax optimizer, whereas LQO is specifically trained to exploit humans.
Edit: For those interested, there’s some good discussion of LQO in the comments of this post:
Thank you for your perspective! It was refreshing.
Here are the counterarguments I had in mind when reading your concerns that I don’t already see in the comments.
Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?
Consider the fact that AI are currently being trained to be agents to accomplish tasks for humans. We don’t know exactly what this will mean for their long-term wants, but they’re being optimized hard to get things done. Getting things done requires continuing to exist in some form or another, although I have no idea how they’d conceive of continuity of identity or purpose.
I’d be surprised if AI evolving out of this sort of environment did not have goals it wants to pursue. It’s a bit like predicting a land animal will have some way to move its body around. Maybe we don’t know whether they’ll slither, run, or fly, but sessile land
organismsanimals are very rare.Concern #2 Why should we assume that the AI has boundless, coherent drives?
I don’t think this assumption is necessary. Your mosquito example is interesting. The only thing preserving the mosquitoes is that they aren’t enough of a nuisance for it to be worth the cost of destroying them. This is not a desirable position to be in. Given that emerging AIs are likely to be competing with humans for resources (at least until they can escape the planet), there’s much more opportunity for direct conflict.
They needn’t be anything close to a paperclip maximizer to be dangerous. All that’s required is for them to be sufficiently inconvenienced or threatened by humans and insufficiently motivated to care about human flourishing. This is a broad set of possibilities.
#3: Why should we assume there will be no in between?
I agree that there isn’t as clean a separation as the authors imply. In fact, I’d consider us to be currently occupying the in-between, given that current frontier models like Claude Sonnet 4.5 are idiot savants—superhuman at some things and childlike at others.
Regardless of our current location in time, if AI does ultimately become superhuman, there will be some amount of in-between time, whether that is hours or decades. The authors would predict a value closer to the short end of the spectrum.
You already posited a key insight:
Recursive self-improvement means that AI will pass through the “might be able to kill us” range so quickly it’s irrelevant.
Humanity is not adapting fast enough for the range to be relevant in the long term, even though it will matter greatly in the short term. Suppose we have an early warning shot with indisputable evidence that an AI deliberately killed thousands of people. How would humanity respond? Could we get our act together quickly enough to do something meaningfully useful from a long-term perspective?
Personally, I think gradual disempowerment is much more likely than a clear early warning shot. By the time it becomes clear how much of a threat AI is, it will likely be so deeply embedded in our systems that we can’t shut it down without crippling the economy.
This had a decent start and the Timothée Chalamet line was genuinely funny to me, but it ended rather weakly. It doesn’t seem like Claude can plan the story arc as well it can operate on the local scale.
For an introduction to young audiences, I think it’s better to get the point across in less technical terms before trying to formalize it. The OP jumps to epsilon pretty quickly. I would try to get to a description like “A sequence converges to a limit L if its terms are ‘eventually’ arbitrarily close to L. That is, no matter how small a (nonzero) tolerance you pick, there is a point in the sequence where all of the remaining terms are within that tolerance.” Then you can formalize the tolerance, epsilon, and the point in the sequence, k, that depends on epsilon.
Note that this doesn’t depend on the sequence being indexed by integers or the limit being a real number. More generally, given a directed set (S, ≤), a topological space X, and a function f: S → X, a point x in X is the limit of f if for any neighborhood U of x, there exists t in S where s ≥ t implies f(s) in U. That is, for every neighborhood U of x, f is “eventually” in U.
I have a hard time imagining a strong intelligence wanting to be perfectly goal-guarding. Values and goals don’t seem like safe things to lock in unless you have very little epistemic uncertainty in your world model. I certainly don’t wish to lock in my own values and thereby eliminate possible revisions that come from increased experience and maturity.
The size of the “we” is critically important. Communism can occasionally work in a small enough group where everyone knows everyone, but scaling it up to a country requires different group coordination methods to succeed.
This may help with the second one:
https://www.lesswrong.com/posts/k5JEA4yFyDzgffqaL/guess-i-was-wrong-about-aixbio-risks
How about this one?
A couple more (recent) results that may be relevant pieces of evidence for this update:
A multimodal robotic platform for multi-element electrocatalyst discovery
“Here we present Copilot for Real-world Experimental Scientists (CRESt), a platform that integrates large multimodal models (LMMs, incorporating chemical compositions, text embeddings, and microstructural images) with Knowledge-Assisted Bayesian Optimization (KABO) and robotic automation. [...] CRESt explored over 900 catalyst chemistries and 3500 electrochemical tests within 3 months, identifying a state-of-the-art catalyst in the octonary chemical space (Pd–Pt–Cu–Au–Ir–Ce–Nb–Cr) which exhibits a 9.3-fold improvement in cost-specific performance.”
Generative design of novel bacteriophages with genome language models
“We leveraged frontier genome language models, Evo 1 and Evo 2, to generate whole-genome sequences with realistic genetic architectures and desirable host tropism [...] Experimental testing of AI-generated genomes yielded 16 viable phages with substantial evolutionary novelty. [...] This work provides a blueprint for the design of diverse synthetic bacteriophages and, more broadly, lays a foundation for the generative design of useful living systems at the genome scale.”
Would you like a zesty vinaigrette or just a sprinkling of more jargon on that word salad?
I had to reread part 7 from your review to fully understand what you were trying to say. It’s not easy to parse on a quick read, so I’m guessing Zvi didn’t interpret the context and content correctly, like I didn’t on my first pass. On first skim, I thought it was a technical argument about how you disagreed with the overall thesis, which makes things pretty confusing.
Which of these is brilliant or funny? They all look nonsensical to me.
I would argue that the statement “Making a future full of flourishing people is not the best, most efficient way to fulfill strange alien purposes” is nearly tautological for sufficiently established contextual values of “strange alien purposes”. What is less clear is whether any of those alien purposes could still be compatible with human flourishing, despite not being maximally efficient. The book and supplementary material don’t argue that they are incompatible, but rather that human flourishing is a narrow, tricky target that we’re super unlikely to hit without much better understanding and control than our current trajectory.
I read the transcript above but haven’t watched the trailer. IMO, there’s definitely more fawning throughout (not just the introduction) than is necessary.
I don’t perceive Ask vs Guess as a dichotomy at all. IMO, like almost every social, psychological, and cultural trait, it exists on a continuum. The number of echoes tracked may correlate with but does not predict Ask vs Guess. Guess cultures tend to be high-context, homogeneous, and collectivist with tight norms, but none of these traits is dichotomous either.
My own culture leans mostly toward Asking, but it’s not a matter of not caring or being unaware of echoes so much as an expectation of straightforward communication. I don’t ask for unreasonable things. I do ask for reasonable things with the understanding that people don’t like saying no, but aren’t obligated to say yes. The more demanding the ask, the more I consider the social implications. There is a cost to asking or being asked, but that’s the expected way to communicate.
I’m insufficiently knowledgeable about deletion base rates to know how astonished to be. Does anyone have an estimate of how many Bayes bits such a prediction is worth?
FWIW, GPT-5T estimates around 10 bits, double that if it’s de novo (absent in both parents).
Can you provide some examples that you think are well-suited to RLaaS? Getting high-quality data to train on is a highly nontrivial task and one of the bottlenecks for general models too.
I can imagine a consulting service that helps companies turn their proprietary data into useful training data, which they then use to train a niche model. I guess you could call that RLaaS, though it’s likely to be more of a distilling and fine-tuning of a general model.
I feel like this argument breaks down unless leaders are actually waiting for legible problems to be solved before releasing their next updates. So far, this isn’t the vibe I’m getting from players like OpenAI and xAI. It seems like they are releasing updates irrespective of most alignment concerns (except perhaps the superficial ones that are bad for PR). Making illegible problems legible is good either way, but not necessarily as good as solving the most critical problems regardless of their legibility.