A reminder for people who are unhappy with the current state of the Internet: you have the option of just using it less.
Nate Showell
A breakthrough in a model’s benchmark performance or in training/inferencing costs would usually be more commercially useful to an AI company than a breakthrough in alignment, but compared to alignment, they’re higher-hanging fruit due to the larger amounts of work that have already been done on model performance and efficiency. Alignment research will sometimes be more cost-effective for an AI company, especially for companies that aren’t big enough to do frontier-scale training runs and have to compete on some other axis.
I disagree about increasing engagingness being more commercially useful for AI companies than increasing alignment. In terms of potential future revenues, the big bucks are in agentic tool-use systems that are sold B2B (e.g., to automate office work), not in consumer-facing systems like chatbots. For B2B tool use systems, engagingness doesn’t matter but alignment does. And this relevance of alignment includes avoiding failure modes like scheming.
AI alignment research isn’t just for more powerful future AIs, it’s commercially useful right now. In order to have a product customers will want to buy, it’s useful for companies building AI systems to make sure those systems understand and follow the user’s instructions, without doing things that would be damaging to the user’s interests or the company’s reputation. An AI company doesn’t want its products to do things like leaking the user’s personally identifying information or writing code with hard-coded unit test results. In fact, alignment is probably the biggest bottleneck right now on the commercial applicability of agentic tool use systems.
My prediction:
Nobody is actually going to use it. The general public has already started treating AI-generated content as pollution instead of something to seek out. Plus, unlike human-created shortform videos, a video generated by a model with a several-months-ago (at best) cutoff date can’t tell you what the latest fashion trends are. The release of Sora-2 has led me to update in favor of the “AI is a bubble” hypothesis because of how obviously disconnected it is from consumer demand.
Maybe I’m just not the target audience, but I also didn’t feel like I got much out of this story. It just seemed like an exercise in cynicism.
A piece of pushback: there might not be a clearly defined crunch time at all. If we get (or are currently in!) a very slow takeoff to AGI, the timing of when an AI starts to become dangerous might be ambiguous. For example, you refer to early crunch time as the time between training and deploying an ASL-4 model, but the implementation of early possibly-dangerous AI might not follow the train-and-deploy pattern. It might instead look more like gradually adding and swapping out components in a framework that includes multiple models and tools. The point at which the overall system becomes dangerous might not be noticeable until significantly after the fact, especially if the lab is quickly iterating on a lot of different configurations.
The LLMs might be picking up the spiral symbolism from Spiral Dynamics.
I follow a hardline no-Twitter policy. I don’t visit Twitter at all, and if a post somewhere else has a screenshot of a tweet I’ll scroll past without reading it. There are some writers like Zvi whom I’ve stopped reading because their writing is too heavily influenced by Twitter and quotes too many tweets.
I was describing reasoning about idealized superintelligent systems as the method used in agent foundations research, rather than its goal. In the same way that string theory is trying to figure out “what is up with elementary particles at all,” and tries to answer that question by doing not-really-math about extreme energy levels, agent foundations is trying to figure out “what is up with agency at all” by doing not-really-math about extreme intelligence levels.
If you’ve made enough progress in your research that it can make testable predictions about current or near-future systems, I’d like to see them. But the persistent failure of agent foundations research to come up with any such bridge between idealized models and real-world system has made me doubtful that the former are relevant to the latter.
For me, the OP brought to mind another kind of “not really math, not really science”: string theory. My criticisms of agent foundations research are analogous to Sabine Hossenfelder’s criticisms of string theory, in that string theory and agent foundations both screen themselves off from the possibility of experimental testing in their choice of subject matter: the Planck scale and very early universe for the former, and idealized superintelligent systems for the latter. For both, real-world counterparts (known elementary particles and fundamental forces; humans and existing AI systems) of the objects they study are primarily used as targets to which to overfit their theoretical models. They don’t make testable predictions about current or near-future systems. Unlike with early computer science, agent foundations doesn’t come with an expectation of being able to perform experiments in the future, or even to perform rigorous observational studies.
Building on what you said, pre-LLM agent foundations research appears to have made the following assumptions about what advanced AI systems would be like:
Decision-making processes and ontologies are separable. An AI system’s decision process can be isolated and connected to a different world-model, or vice versa.
The decision-making process is human-comprehensible and has a much shorter description length than the ontology.
As AI systems become more powerful, their decision processes approach a theoretically optimal decision theory that can also be succinctly expressed and understood by human researchers.
None of these assumptions ended up being true of LLMs. In an LLM, the world-model and decision process are mixed together in a single neural network instead of being separate entities. LLMs don’t come with decision-related concepts like “hypothesis” and “causality” pre-loaded; those concepts are learned over the course of training and are represented in the same messy, polysemantic way as any other learned concept. There’s no way to separate out the reasoning-related features to get a decision process you could plug into a different world-model. In addition, when LLMs are scaled up, their decision-making becomes more complex and inscrutable due to being distributed across the neural network. The LLM’s decision-making process doesn’t converge into a simple and human-comprehensible decision theory.
More useful. It would save us the step of having to check for hallucinations when doing research.
Another example of this pattern that’s entered mainstream awareness is tilt. When I’m playing chess and get tilted, I might think things like “all my opponents are cheating, “I’m terrible at this game and therefore stupid,” or “I know I’m going to win this time, how could I not win against such a low-rated opponent.” But if I take a step back, notice that I’m tilted, and ask myself what information I’m getting from the feeling of being tilted, I notice that it’s telling me to take a break until I can stop obsessing over the result of the previous game.
Tilt is common, but also easy to fix once you notice the pattern of what it’s telling you and start taking breaks when you experience it. The word “tilt” is another instance of a hangriness-type stance that’s caught on because of its strong practical benefits—having access to the word “tilt” makes it easier to notice.
It’s working now. I think the problem was on my end.
This strategy suggests that decreasing ML model sycophancy should be a priority for technical researchers. It’s probably the biggest current barrier to the usefulness of ML models as personal decision-making assistants. Hallucinations are probably the second-biggest barrier.
The new feed doesn’t load at all for me.
There’s another way in which pessimism can be used as a coping mechanism: it can be an excuse to avoid addressing personal-scale problems. A belief that one is doomed to fail, or that the world is inexorably getting worse, can be used as an excuse to give up, on the grounds that comparatively small-scale problems will be swamped by uncontrollable societal forces. Compared to confronting those personal-scale problems, giving up can seem very appealing, and a comparison to a large-scale but abstract problem can act as an excuse for surrender. You probably know someone who spends substantial amounts of their free time watching videos, reading articles, and listening to podcasts that blame all of the world’s problems on “capitalism,” “systemic racism,” “civilizational decline,” or something similar, all while their bills are overdue and dishes pile up in their sink.
This use of pessimism as a coping mechanism is especially pronounced in the case of apocalypticism. If the world is about to end, every other problem becomes much less relevant in comparison, including all those small-scale problems that are actionable but unpleasant to work on. Apocalypticism can become a blanket pretext for giving in to your ugh fields. And while you’re giving in to them, you end up thinking you’re doing a great job of utilizing the skill of staring into the abyss (you’re confronting the possibility of the end of the world, right?) when you’re actually doing this exact opposite. Rather than something related to preverbal trauma, this usability as a coping mechanism is the more likely source of the psychological appeal of AI apocalypticism for many people who encounter it.
- 27 Jun 2025 16:54 UTC; 6 points) 's comment on Consider chilling out in 2028 by (
Another experiment idea: testing whether the reduction in hallucinations that Yao et al. achieved with unlearning can be made robust.
Do LLMs perform better at games that are later in the Pokemon series? If difficulty interpreting pixel art is what’s holding them back, it would be less of a problem when playing later Pokemon games with higher-resolution sprites.
No, don’t leak people’s private medical information just because you think it will help the AI safety movement. That belongs in the same category as doxxing people or using violence. Even from a purely practical standpoint, without considering questions of morality, it’s useful to precommit to not leaking people’s medical information if you want them to trust you and work with you.
And that’s assuming the rumor is true. Considering that this is a rumor we’re talking about, it likely isn’t.