I was describing reasoning about idealized superintelligent systems as the method used in agent foundations research, rather than its goal. In the same way that string theory is trying to figure out “what is up with elementary particles at all,” and tries to answer that question by doing not-really-math about extreme energy levels, agent foundations is trying to figure out “what is up with agency at all” by doing not-really-math about extreme intelligence levels.
If you’ve made enough progress in your research that it can make testable predictions about current or near-future systems, I’d like to see them. But the persistent failure of agent foundations research to come up with any such bridge between idealized models and real-world system has made me doubtful that the former are relevant to the latter.
I haven’t looked into it yet but apparently Peter Bloem showed that pretraining on a Solomonoff-like task also improves performance on text prediction: https://arxiv.org/abs/2506.20057
Taken together, seems like some empirical evidence for LLM ICL as approximating Solomonoff induction, which is a frame I’ve been using clearly motivated by a type of “agent foundations” or at least “learning foundations” intuition. Of course it’s very loose. I’m working on a better example.
(Incidentally, I would probably be considered to be in math academia)
...I also do not use “reasoning about idealized superintelligent systems as the method” of my agent foundations research. Certainly there are examples of this in agent foundations, but it is not the majority. It is not the majority of what Garrabrant or Demski or Ngo or Wentworth or Turner do, as far as I know.
It sounds to me like you’re not really familiar with the breadth of agent foundations. Which is perfectly fine, because it’s not a cohesive field yet, nor is the existing work easily understandable. But I think you should aim for your statements to be more calibrated.
I was describing reasoning about idealized superintelligent systems as the method used in agent foundations research, rather than its goal. In the same way that string theory is trying to figure out “what is up with elementary particles at all,” and tries to answer that question by doing not-really-math about extreme energy levels, agent foundations is trying to figure out “what is up with agency at all” by doing not-really-math about extreme intelligence levels.
If you’ve made enough progress in your research that it can make testable predictions about current or near-future systems, I’d like to see them. But the persistent failure of agent foundations research to come up with any such bridge between idealized models and real-world system has made me doubtful that the former are relevant to the latter.
I predicted that LLM ICL would perform reasonably well at predicting the universal distribution without finetuing and it apparently does:
https://www.alignmentforum.org/posts/xyYss3oCzovibHxAF/llm-in-context-learning-as-approximating-solomonoff
Would love to see a follow up experiment on this.
I haven’t looked into it yet but apparently Peter Bloem showed that pretraining on a Solomonoff-like task also improves performance on text prediction: https://arxiv.org/abs/2506.20057
Taken together, seems like some empirical evidence for LLM ICL as approximating Solomonoff induction, which is a frame I’ve been using clearly motivated by a type of “agent foundations” or at least “learning foundations” intuition. Of course it’s very loose. I’m working on a better example.
(Incidentally, I would probably be considered to be in math academia)
...I also do not use “reasoning about idealized superintelligent systems as the method” of my agent foundations research. Certainly there are examples of this in agent foundations, but it is not the majority. It is not the majority of what Garrabrant or Demski or Ngo or Wentworth or Turner do, as far as I know.
It sounds to me like you’re not really familiar with the breadth of agent foundations. Which is perfectly fine, because it’s not a cohesive field yet, nor is the existing work easily understandable. But I think you should aim for your statements to be more calibrated.