Regards safe outcomes for superintelligence, your parenthetical remark is the one I believe most important. Far above any prosaic or theoretical safety work, our priority should be regulation preventing the development and release of superintelligence, at least until we have strong guarantees on its safety.
I don’t really disagree with any of the other points in your comment. Without a regulatory framework, it seems very likely that prosaic safety techniques will only contribute to bad outcomes. So it makes sense to me if one wants to focus on agent foundations and similar theoretic work. My post is not intended as a critique of agent foundations per se!
However, I do believe that one must be clearsighted on the risks of theoretic work, particularly when built upon abstractions. My critique is that agent foundations sometimes fails to make its assumptions explicit and works backwards from abstractions, effectively building a castle in the sky. A more robust approach would be to make these assumptions very explicit, ideally linking a theory to a set of axioms, so that we can better assess the defensibility of a theory. Some branches of continental philosophy are very bad at this (e.g. Lacan), starting from “metaphor” rather than an axiom, which is why I draw the parallel.
I will note that prosaic safety work could be relevant under a strong regulatory framework. For example, suppose we established an international treaty to freeze AI development at ChatGPT 5.5 Pro / Mythos. The treaty states that we can only advance to higher capabilities/intelligence when we are “sure” that the next model is aligned. With huge amounts of resources dedicated to verifying the next model if safe, it seems feasible to me that prosaic approaches could play a large role in building safe AI under such a regime.
Now, setting up sufficiently strong regulation is of course very hard, and one might critique that “proving” that the next generation of a model is aligned is akin to solving alignment itself! But I suspect that guaranteeing a single model is aligned is much easier than solving alignment for all possible models.
I would still guess it is better not to do prosaic safety work until a global regulatory framework exists, since it accelerates AI progress and thus reduces opportunities to implement said regulation. But there are enough counterarguments that I would be careful moralizing over it (not suggesting anyone in the comments is doing so!).
(1) It’s surprising to me that you bring up analytic philosophy as a better parallel. Writing in agent foundations / LessWrong feels very different to me than analytic philosophy!
Analytic philosophy works within a well established and rigorous taxonomy of terms / concepts, as evidenced by, e.g., PhilPapers and the Stanford Encyclopedia of Philosophy. The assumptions at the roots of this taxonomy are generally pretty well explored. So even if philosophers are not exactly deducting an entire chain of belief for every paper, we can usually articulate the tradition within which an author operates, and understand the common arguments and axioms.
This is in contrast to continental philosophy, which is often much less explicit about its assumptions, and instead draws on a hodge-podge of different thinkers, ranging from Freud to Hegel, without rigorously examining its own claims. Not all continental philosophy is like this! Alain Badiou, for example, starts from an ontological exploration of reality based on set theory to build up to his theory of politics. But the parallel with continent philosophy is exactly to point out this lack of consistency and this poor habit of leaving assumptions implicit.
If others belief I am being too generous in my treatment of analytic philosophy, I’d be interested to hear why.
(2) I agree with your point that the examples could be improved.
(3) I agree that clean conclusions would be nice, but it seems legitimate as well to simplify identify the problem. I’d also assert that some of the conclusions are implicit in the critique, i.e. be cautious of formalizing an inherently imprecise concept, or don’t treat the “epistemic status” label as permission to advocate for a dubious opinion. Agent foundations has a very hard task set for itself, so I wouldn’t pretend to have the answers for how it can ensure intellectual rigor.
EY has been incredibly productive, so while I’m sure there’s counterpoints like those you cited, The Sequences themselves seem like a clear example of a more verbose writing style (without making an assessment of this as good/bad; maybe it’s fit for purpose! My critique is that this has influenced others to replicate the style when it may not be appropriate).