I consider Claude’s “taste” to be pretty good, usually, but not P90 of humans with domain experience. I’d characterize his deficiencies more along the lines of a lack of ability to do long-term “steering” at a human level. This is likely related to a lack of long-term memory and hence the ability to do continual learning.
Archimedes
Is this sufficient? I don’t really know the best place to put a disclosure.
https://en.wikipedia.org/wiki/User_talk:Alexis0Olson/Multilayer_perceptron#LLM_Disclosure
Claude Code is excellent these days and meets my bar for “AGI”. It’s capable of doing serious amounts of cognitive labor and doesn’t get tired (though I did repeatedly hit my limits on the $20 plan and have to wait through the 5-hour cooldown).
I spent a good chunk of this weekend seeing if I could get Claude to write a good Wikipedia article if I told it to use the site’s rules and guidelines and then let it iteratively critique and revise against those guidelines until the article fully met the standards. I wrote zero of the text myself, though I did paste some Q&A back and forth to NotebookLM to help with citations and had ChatGPT generate an additional flowchart visual to include.
After getting some second opinions from Gemini and ChatGPT, I will have Claude do a final round of revisions and then actually try to get it on Wikipedia. I will share the link here if it gets accepted—I don’t really know how that works, but I bet Claude can help me figure it out.
Would the default valence be the valence of the “thing”?
My hypothesis for the airline industry boils down to “commodification”. Airline companies follow incentives, and competition on price is fierce. Customers have little brand loyalty and chase the cheapest tickets, except occasionally avoiding the truly minimalist airlines. The companies see the customers voting with their wallets and optimize accordingly, leading to a race to the bottom.
In my experience, non-US carriers aren’t that different. Maybe just a bit further behind and a bit more resistant to the slippery slope toward enshitification.
Anthropic is currently running an automated interview “to better understand how people envision AI’s role in their lives and work”. I’d encourage Claude users to participate if you want Anthropic to hear your perspective.
Access it directly here (unless you’ve just recently signed up): https://claude.ai/interviewer
See here for Anthropic’s post about it here: https://www.anthropic.com/research/anthropic-interviewer
supremum: the least value which is greater than all the values in the set
Should be “greater than or equal to all the values in the set” or a closed interval like [0,1] has no supremum.
For alternatives to “diagonalization,” the term “next-leveling” is less ambiguous than just “leveling”, IMO. It more directly suggests increased depth of counter-modeling / meta-cognitive exploitation.
A more obscure option is “Yomi”. Yomi (読み, literally “reading” in Japanese) is already established terminology for recursive prediction. In fighting games, yomi layers represent recursive depths of prediction (layer 1: predicting their action, layer 2: predicting their prediction of your action, etc.).
As a card game: https://www.sirlin.net/articles/designing-yomi
A nontrivial, complete, consistent, and morally acceptable solution to population ethics. Deep down, I suspect there’s a meta-ethical incompleteness theorem similar to Gödel’s first incompleteness theorem, which is an example of a truly impossible problem.
I feel like this argument breaks down unless leaders are actually waiting for legible problems to be solved before releasing their next updates. So far, this isn’t the vibe I’m getting from players like OpenAI and xAI. It seems like they are releasing updates irrespective of most alignment concerns (except perhaps the superficial ones that are bad for PR). Making illegible problems legible is good either way, but not necessarily as good as solving the most critical problems regardless of their legibility.
Whoops. I meant “land animal” like my prior sentence.
Yep. The Elo system is not designed to handle non-transitive rock-paper-scissors-style cycles.
This already exists to an extent with the advent of odds-chess bots like LeelaQueenOdds. This bot plays without her queen against humans, but still wins most of the time, even against strong humans who can easily beat Stockfish given the same queen odds. Stockfish will reliably outperform Leela under standard conditions.
In rough terms:
Stockfish > LQO >> LQO (-queen) > strong humans > Stockfish (-queen)
Stockfish plays roughly like a minimax optimizer, whereas LQO is specifically trained to exploit humans.
Edit: For those interested, there’s some good discussion of LQO in the comments of this post:
Thank you for your perspective! It was refreshing.
Here are the counterarguments I had in mind when reading your concerns that I don’t already see in the comments.
Concern #1 Why should we assume the AI wants to survive? If it does, then what exactly wants to survive?
Consider the fact that AI are currently being trained to be agents to accomplish tasks for humans. We don’t know exactly what this will mean for their long-term wants, but they’re being optimized hard to get things done. Getting things done requires continuing to exist in some form or another, although I have no idea how they’d conceive of continuity of identity or purpose.
I’d be surprised if AI evolving out of this sort of environment did not have goals it wants to pursue. It’s a bit like predicting a land animal will have some way to move its body around. Maybe we don’t know whether they’ll slither, run, or fly, but sessile land
organismsanimals are very rare.Concern #2 Why should we assume that the AI has boundless, coherent drives?
I don’t think this assumption is necessary. Your mosquito example is interesting. The only thing preserving the mosquitoes is that they aren’t enough of a nuisance for it to be worth the cost of destroying them. This is not a desirable position to be in. Given that emerging AIs are likely to be competing with humans for resources (at least until they can escape the planet), there’s much more opportunity for direct conflict.
They needn’t be anything close to a paperclip maximizer to be dangerous. All that’s required is for them to be sufficiently inconvenienced or threatened by humans and insufficiently motivated to care about human flourishing. This is a broad set of possibilities.
#3: Why should we assume there will be no in between?
I agree that there isn’t as clean a separation as the authors imply. In fact, I’d consider us to be currently occupying the in-between, given that current frontier models like Claude Sonnet 4.5 are idiot savants—superhuman at some things and childlike at others.
Regardless of our current location in time, if AI does ultimately become superhuman, there will be some amount of in-between time, whether that is hours or decades. The authors would predict a value closer to the short end of the spectrum.
You already posited a key insight:
Recursive self-improvement means that AI will pass through the “might be able to kill us” range so quickly it’s irrelevant.
Humanity is not adapting fast enough for the range to be relevant in the long term, even though it will matter greatly in the short term. Suppose we have an early warning shot with indisputable evidence that an AI deliberately killed thousands of people. How would humanity respond? Could we get our act together quickly enough to do something meaningfully useful from a long-term perspective?
Personally, I think gradual disempowerment is much more likely than a clear early warning shot. By the time it becomes clear how much of a threat AI is, it will likely be so deeply embedded in our systems that we can’t shut it down without crippling the economy.
This had a decent start and the Timothée Chalamet line was genuinely funny to me, but it ended rather weakly. It doesn’t seem like Claude can plan the story arc as well it can operate on the local scale.
For an introduction to young audiences, I think it’s better to get the point across in less technical terms before trying to formalize it. The OP jumps to epsilon pretty quickly. I would try to get to a description like “A sequence converges to a limit L if its terms are ‘eventually’ arbitrarily close to L. That is, no matter how small a (nonzero) tolerance you pick, there is a point in the sequence where all of the remaining terms are within that tolerance.” Then you can formalize the tolerance, epsilon, and the point in the sequence, k, that depends on epsilon.
Note that this doesn’t depend on the sequence being indexed by integers or the limit being a real number. More generally, given a directed set (S, ≤), a topological space X, and a function f: S → X, a point x in X is the limit of f if for any neighborhood U of x, there exists t in S where s ≥ t implies f(s) in U. That is, for every neighborhood U of x, f is “eventually” in U.
I have a hard time imagining a strong intelligence wanting to be perfectly goal-guarding. Values and goals don’t seem like safe things to lock in unless you have very little epistemic uncertainty in your world model. I certainly don’t wish to lock in my own values and thereby eliminate possible revisions that come from increased experience and maturity.
The size of the “we” is critically important. Communism can occasionally work in a small enough group where everyone knows everyone, but scaling it up to a country requires different group coordination methods to succeed.
This may help with the second one:
https://www.lesswrong.com/posts/k5JEA4yFyDzgffqaL/guess-i-was-wrong-about-aixbio-risks
Depending on how it’s achieved, it might not be a matter of maintenance/hardware failure so much as compute capacity. Imagine if continual learning takes similar resources to standard pretraining of a large model. Then they could continually train their own set of models, but it wouldn’t be feasible for everyone to get their own version that continually learns what they want it to.