I agree this sounds plausible and appreciate with the distinction you’re drawing between still competing for the future, but no longer racing for the software intelligence explosion AGI.
Ben Goldhaber
xAi seems like they are dropping out of the frontier AGI race.
“xAI will be dissolved as a separate company, so it will just be SpaceXAI, the AI products from SpaceX.” (Musk on X, May 6)
xAI was acquired in Feb 2026 through a reverse triangular merger, keeping it separate with its own debt, liabilities, and structure. It’s now being folded into SpaceX as a division focused on integrating AI across the SpaceX stack.
Also 11 of the 12 original co-founders have left, Grok DAU fell from 13.9M to 12.2M March→April while Claude went 16M→23M, and xAI burned $7.8B in the first 9 months of 2025.
Letting Anthropic use all of Colossus 1 for inference, for an estimated $3-4B/yr.
Even if Colossus 2 is the new training cluster, handing Anthropic 220k GPUs frees Anthropic to then redirect its own compute toward training and research is not what you’d do in a tight race for AGI.
It’s possible this is all a play for the IPO, to get a better narrative for raising huge sums, and that they turn xAI into a GDM like organization. And Colossus 2 still has the biggest compute capacity of any cluster in the world, so it’s certainly can train big models.
But I think it’s not at all trivial to integrate a company, take on all its debt and obligations, change its corporate reporting structure with a very distinct mission, while continuing to do frontier AGI development that’s going to take off.
Also filings haven’t reflected this yet because the only one that exists is the April 1 confidential draft. The public S-1 is expected the week of May 18-22, working back 15 days from the targeted June 8 roadshow.
It’s much much easier now to automate mechanisms like liquid democracy; you could run experiments inside organizations that would be a.) fun and b.) test practicality. Google ran an experiment in 2015 using this to select snacks; do it again but with AI delegates either representing your preferences or making snack decisions.
I quite like this framing, and think Strategic Competence is a useful term and concept. I explored a related idea in Wise AI Advisors at the Hinge of History :
I posit that if:
We have trusted AI systems that people turn to as advisors
And the trust in the AI advisors is well placed because they have good epistemics
Where “good epistemics” roughly means they consistently use reliable methods to figure out what is true and avoid self-deception
And the AI advisors have shared epistemics
Where “shared epistemics” roughly means there is a shared foundation that allows different AI advisors and people to trust one another’s reasoning.
Then it implies that those AI Advisors would advise their Principals to avoid an Intelligence Explosion—if and only if this is in fact a real danger—and humanity could coordinate around this advice.
I expect Strategic Competence to largely track general model capabilities, whereas shared trusted epistemics requires more deliberate work on validation, auditing, and institution-building that won’t happen by default.
This overlaps with your points on improving AI philosophical competence, but with more focus on making epistemics verifiable and legible across different systems and actors, which is what I think would be needed (alongside model improvements) to enable guidance that people follow, and common knowledge needed for preference cascades, to get many actors to agree with wise AI advisors to prevent RSI takeoffs.
$50,000 FRC legal defense insurance, access to our specialized attorneys, and a cryptographically signed Safety Log that has precedent in state courts as admissible evidence of your child’s continuous supervision via Lifelink
I like this idea of insurance here—it would help with some of my fears—and you could imagine the company building this giving higher defense budgets at the start to help fight court cases that would establish precedent that Lifelink is suitable monitoring.
Congratulations to Anna and the team for cohering around a vision and set of experiments. I donated to the new CFAR; I hope you continue posting about what you learn through the upcoming workshops.
One {note? suggestion? “real spirit” discussion point?} - I feel like the framing of aCFAR was missing something important about the state of rationality today. Namely from this, the year 2026 onward, being more rational is unlikely to be a “human technique” only affair. It will look more like cyborgs and centaurs—humans using AI tools and agents in different configurations and ways to make better decisions.
I won’t belabor how good the AIs have gotten, and instead will just note that they are effective aids for rationalist techniques:
I wrote a post about backchaining where I had Claude create malleable, customizable timelines. I found this to be a really effective way to “feel” at the S1 level the constraints and targets.
They’re very good at making Fermi estimates.
There’s ongoing research and experiments into using them as mediators and for fostering cooperation, à la double crux.
They’re probably useful for Focusing and internal work too (I know the Jhourney team has been running experiments here, though I haven’t found it that effective personally).
I appreciate that it’s a Center for Applied Rationality, and maybe this particular center doesn’t need to think about the cyborg angle and can just focus on developing better models of “who-ness”. Maybe a different center should!
But it seems valuable to consider, to the extent you want to push forward the frontier of rationality. I suspect there’s some connection between the moments when AI meaningfully aids my real thinking, the moments when I’m doing slop-ful fake thinking where the AI is aiding my delusions, and the concept you’re defining as “who-ness.” Who-ness seems adjacent to taste, which might matter a lot for steering AI fleets towards goodness and meaningful concepts. And it’s probably the case that the general rationality techniques you’re imparting and working on with attendees could be more effective with AI assistance.
In Jack Clark’s Import AI 439, he references a new paper Universally Converging Representations of Matter Across Scientific Foundation Models
> Do AI systems end up finding similar ways to represent the world to themselves? Yes, as they get smarter and more capable, they arrive at a common set of ways of representing the world.
> The latest evidence for this is research from MIT which shows that this is true for scientific models and the modalities they’re trained on: “representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems,” they write. “Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality.”> As with other studies of representation, they found that as you scale the data and compute models are trained on, “their representations converge further”.
This seems like useful empirical evidence for the Natural Abstraction Hypothesis (I haven’t been following progress on that research agenda, so I don’t know how significant of an update this is)
Thank you for the comment Saul—I agree with a lot of your points, in particular that “explosive” periods are costly and inefficient (relative perhaps to some ideal), and that they are not in and of themselves a solution for long-term retention.
I expect if we have a crux it’s whether someone who intends to follow an incremental path vs someone who does an intense acquisition period is more likely to, ~ a year later, actually have the skill. And my guess is, for a number of reasons, it’s the later; I’d expect a lot of incrementalists to ‘just not actually do the thing’.
* My ideal strategy would be “explore lightly a number of things, to determine what you want → explode towards that for an intense period of time → establish incremental practices to maintain and improve”
* Your comment also highlighted for me, something that I had cut from the initial draft, my belief that explosive periods help overcome emotional blockers, which I think might be a big part of why people shy away from skills they say they want.
No worries, I appreciate the perspective. I agree that for many skills there is a consolidation and rest period that is needed. An obvious example is that you can’t cram all of the effort needed to build muscle into one week and expect the same kinds of returns that you would get over many months. Though, I do expect you could master the biomechanical skills of weightlifting much faster with that attitude!
If you have examples of the multidimensional learning schedule, I’d love to hear them. I’m imagining something like {30 minutes of spanish language shows}?
thanks for sharing a data point on that claim
Thank you for the post! I have also been (very pleasantly!) surprised by how aligned current models seem to be.
I’m curious if you expect ‘alignment by default’ to continue to hold in a regime where continuous learning is solved and models are constantly updating themselves/being updated by what they encounter in the world?
Chain of Thought not producing evidence of scheming or instrumental convergence does seem like evidence against, but it seems quite weak to me as a proxy for what to expect from ‘true agents’. CoT doesn’t run long enough or have the type of flexibility I’d expect to see in an agent that was actually learning over long time horizons, which would give it the affordance to contemplate the many ways it could accomplish its goals.And, while just speculation, I imagine that the kind of training procedures we’re doing now to instill alignment will not be possible with Continuous Learning, or we’ll have to pay a heavy alignment tax to do that for these agents. Note: Jan’s recent tweet on his impression that it is quite hard to align large models and it doesn’t fall out for free from size.
Thank you! I had not read this Kevin Simler essay but I quite like it, and it does match my perspective.
I did! and I in fact have read—well some of :) - the whitepaper. But it still seems weird that it’s not possible to Increase the Trust in the third party through financial means, dramatic PR stunts (auditor promises to commit sepuku if they are found to have lied)
source needed, but I recall someone on the community notes team saying it was very similar but there are some small differences between prod and the open source version (it’s difficult to maintain exact compatibility). For the point of the comment and context I agree open source does a good job of this, though given the number of people on twitter who still allege its being manipulated, I think you need some additional juice (a whistleblower prize?)
Why so few third party auditors of algorithms? for instance, you could have an auditing agency make specific assertions about what the twitter algorithm is doing, whether the community notes is ‘rigged’
It could be that this is too large of a codebase, too many people can make changes, it’s too hard to verify the algorithm in production is stable. This seems unlikely to me with most modern devops stacks
It could be that no one will trust the third party agency. I guess this seems most likely… but really, have we even tried? Could we not have some group of monk like Auditors who would rather die than lie (my impression is some cyber professionals have this ethos already)
If Elon wanted to spend a couple hundred thousand on insanely commited high integrity auditors, it’d be a great experiment
epistemic status: thought about this for like 15 minutes + two deep research reports
a contrarian pick for underrated technology area is lie detection through brain imaging. It seems like it will become much more robust and ecologically valid through compute scaled AI techniques, and it’s likely to be much better at lie detection than humans because we didn’t have access to images of the internals of other peoples brains in the ancestral environment.
On the surface this seems like it would be transformative—brain scan key employees to make sure they’re not leaking information! test our leaders for dark triad traits (ok that’s a bit different than specific lies but still) - however there’s a cynical part of me that sounds like some combo of @ozziegooenand Robin Hanson which notes we have methods now (like significantly increased surveillance and auditing) which we could use for greater trust and which we don’t employ.
So perhaps this won’t be used except for the most extreme natsec cases, where there are already norms of investigations and reduced privacy.
Related quicktake: https://www.lesswrong.com/posts/hhbibJGt2aQqKJLb7/shortform-1#25tKsX59yBvNH7yjD
Good points! I agree that actual prototyping is necessary to see if an idea works, and as a demo it can be far more convincing. Especially w/ the decreased cost of building web apps, leveraging them for fast demos of techniques seems valuable.
AI for improving human reasoning seems promising; I’m uncertain whether it makes sense to invest in new custom applications, as maybe improvements in models are going to do a lot of the work.
I’m more bullish on investing in exploration of promising workflows and design patterns. As an example, a series of youtube videos and writeups on using O3 as a forecasting aid for grantmaking, with demonstrations. Or a set of examples of using LLMs to aid in productive meetings, with a breakdown of the tech used and social norms that the participants agreed to.
- I think these are much cheaper to do in terms for time and money.
- A lot of epistemics seems to be HCI bottlenecked.
- Good design patterns are easily copyable, which also means they’re probably underinvested in relative to their returns.
- Social diffusion of good epistemic practices will not necessarily hapepn as fast as AI improvements.
- Improving the AIs themselves to be more truth seeking and provide good advice—with good benchmarks—is another avenue.I imagine a fellowship for prompt engineers and designers, prize competitions, or perhaps retroactive funding for people who have already developed good patterns.
I think people should write a bunch of their own vignettes set in the AI 2027 universe. Small snippets of life predictions as things get crazy, on specific projects that may or may not bend the curve, etc.
These are great suggestions—do you have a sense/botec on how expensive it would be, end to end, to do a pessimization training run? Idk if the time/staff attention makes it a non-starter except for toy models.
(Of course the question of its a non-starter is downstream of internal lab political will, but knowing the cost would inform how difficult it would be for interested safety team members to run such an experiment)