For me, the OP brought to mind another kind of “not really math, not really science”: string theory. My criticisms of agent foundations research are analogous to Sabine Hossenfelder’s criticisms of string theory, in that string theory and agent foundations both screen themselves off from the possibility of experimental testing in their choice of subject matter: the Planck scale and very early universe for the former, and idealized superintelligent systems for the latter. For both, real-world counterparts (known elementary particles and fundamental forces; humans and existing AI systems) of the objects they study are primarily used as targets to which to overfit their theoretical models. They don’t make testable predictions about current or near-future systems. Unlike with early computer science, agent foundations doesn’t come with an expectation of being able to perform experiments in the future, or even to perform rigorous observational studies.
Ah, I think this is a straight-forward misconception of what agent foundations. (Or at least, of what my version of agent foundations is.) I am not trying to forge a theory of idealized superintelligent systems. I am trying to forge a theory of “what the heck is up with agency at all??”. I am attempting to forge a theory that can make testable predictions about current and near-future systems.
I was describing reasoning about idealized superintelligent systems as the method used in agent foundations research, rather than its goal. In the same way that string theory is trying to figure out “what is up with elementary particles at all,” and tries to answer that question by doing not-really-math about extreme energy levels, agent foundations is trying to figure out “what is up with agency at all” by doing not-really-math about extreme intelligence levels.
If you’ve made enough progress in your research that it can make testable predictions about current or near-future systems, I’d like to see them. But the persistent failure of agent foundations research to come up with any such bridge between idealized models and real-world system has made me doubtful that the former are relevant to the latter.
I haven’t looked into it yet but apparently Peter Bloem showed that pretraining on a Solomonoff-like task also improves performance on text prediction: https://arxiv.org/abs/2506.20057
Taken together, seems like some empirical evidence for LLM ICL as approximating Solomonoff induction, which is a frame I’ve been using clearly motivated by a type of “agent foundations” or at least “learning foundations” intuition. Of course it’s very loose. I’m working on a better example.
(Incidentally, I would probably be considered to be in math academia)
...I also do not use “reasoning about idealized superintelligent systems as the method” of my agent foundations research. Certainly there are examples of this in agent foundations, but it is not the majority. It is not the majority of what Garrabrant or Demski or Ngo or Wentworth or Turner do, as far as I know.
It sounds to me like you’re not really familiar with the breadth of agent foundations. Which is perfectly fine, because it’s not a cohesive field yet, nor is the existing work easily understandable. But I think you should aim for your statements to be more calibrated.
Notably, in the case of string theory, the fact that it predicts everything we currently observe plus new forces at the planck scale is currently better than all other theories of physics, because currently all other theories either predict something we have reason not to observe or limit themselves to a subset of predictions that other theories already predict, so the fact that string theory can predict everything we observe and predict (admittedly difficult to falsify) observations is enough to make it a leading theory.
No comment on whether the same applies to agent foundations.
in the case of string theory, the fact that it predicts
Hmm, my outsider impression is that there’s in fact a myriad “string theories”, all of them predicting everything we observe, but with no way to experimentally discern the correct one among them for the foreseeable future, which I have understood to be the main criticism. Is this broad-strokes picture fundamentally mistaken?
There are a large number of “string vacua” which contain particles and interactions with the quantum numbers and symmetries we call the standard model, but (1) they typically contain a lot of other stuff that we haven’t seen (2) the real test is whether the constants (e.g. masses and couplings) are the same as observed, and these are hard to calculate (but it’s improving).
For me, the OP brought to mind another kind of “not really math, not really science”: string theory. My criticisms of agent foundations research are analogous to Sabine Hossenfelder’s criticisms of string theory, in that string theory and agent foundations both screen themselves off from the possibility of experimental testing in their choice of subject matter: the Planck scale and very early universe for the former, and idealized superintelligent systems for the latter. For both, real-world counterparts (known elementary particles and fundamental forces; humans and existing AI systems) of the objects they study are primarily used as targets to which to overfit their theoretical models. They don’t make testable predictions about current or near-future systems. Unlike with early computer science, agent foundations doesn’t come with an expectation of being able to perform experiments in the future, or even to perform rigorous observational studies.
Ah, I think this is a straight-forward misconception of what agent foundations. (Or at least, of what my version of agent foundations is.) I am not trying to forge a theory of idealized superintelligent systems. I am trying to forge a theory of “what the heck is up with agency at all??”. I am attempting to forge a theory that can make testable predictions about current and near-future systems.
I was describing reasoning about idealized superintelligent systems as the method used in agent foundations research, rather than its goal. In the same way that string theory is trying to figure out “what is up with elementary particles at all,” and tries to answer that question by doing not-really-math about extreme energy levels, agent foundations is trying to figure out “what is up with agency at all” by doing not-really-math about extreme intelligence levels.
If you’ve made enough progress in your research that it can make testable predictions about current or near-future systems, I’d like to see them. But the persistent failure of agent foundations research to come up with any such bridge between idealized models and real-world system has made me doubtful that the former are relevant to the latter.
I predicted that LLM ICL would perform reasonably well at predicting the universal distribution without finetuing and it apparently does:
https://www.alignmentforum.org/posts/xyYss3oCzovibHxAF/llm-in-context-learning-as-approximating-solomonoff
Would love to see a follow up experiment on this.
I haven’t looked into it yet but apparently Peter Bloem showed that pretraining on a Solomonoff-like task also improves performance on text prediction: https://arxiv.org/abs/2506.20057
Taken together, seems like some empirical evidence for LLM ICL as approximating Solomonoff induction, which is a frame I’ve been using clearly motivated by a type of “agent foundations” or at least “learning foundations” intuition. Of course it’s very loose. I’m working on a better example.
(Incidentally, I would probably be considered to be in math academia)
...I also do not use “reasoning about idealized superintelligent systems as the method” of my agent foundations research. Certainly there are examples of this in agent foundations, but it is not the majority. It is not the majority of what Garrabrant or Demski or Ngo or Wentworth or Turner do, as far as I know.
It sounds to me like you’re not really familiar with the breadth of agent foundations. Which is perfectly fine, because it’s not a cohesive field yet, nor is the existing work easily understandable. But I think you should aim for your statements to be more calibrated.
Notably, in the case of string theory, the fact that it predicts everything we currently observe plus new forces at the planck scale is currently better than all other theories of physics, because currently all other theories either predict something we have reason not to observe or limit themselves to a subset of predictions that other theories already predict, so the fact that string theory can predict everything we observe and predict (admittedly difficult to falsify) observations is enough to make it a leading theory.
No comment on whether the same applies to agent foundations.
Hmm, my outsider impression is that there’s in fact a myriad “string theories”, all of them predicting everything we observe, but with no way to experimentally discern the correct one among them for the foreseeable future, which I have understood to be the main criticism. Is this broad-strokes picture fundamentally mistaken?
There are a large number of “string vacua” which contain particles and interactions with the quantum numbers and symmetries we call the standard model, but (1) they typically contain a lot of other stuff that we haven’t seen (2) the real test is whether the constants (e.g. masses and couplings) are the same as observed, and these are hard to calculate (but it’s improving).