Thanks for writing this up! I strong-upvoted because, as you say, these ideas are not well-communicated, and this post contributes an explanation that I expect to be clarifying to a significant subset of people confused about agent foundations.
Initially, I wasn’t quite buying the claim that we don’t need any experiments (or more generally, additional empirics) to understand agency and all we need is to “just crunch” math and philosophy. The image I had in mind was something like “This theorem proves something non-trivial — or even significantly surprising — about a class of agents that includes humans, and we are in a position to verify it experimentally, so we should do it, to ensure that we’re not fooling ourselves.”.
Then, this passage made it click for me and I saw the possibility that maybe we are in a position where armchairs, whiteboards, tons of paper, and Lean code are sufficient.
It’s noteworthy that humanity did indeed deliberately invent the first Turing-complete programming languages before building Turing-complete computers, and we have also figured out a lot of the theory of quantum computing before building actual quantum computers.
When Alan Turing figured out computability theory, he was not doing pure math for math’s sake; he was trying to grok the nature of computation so that we could actually build better computers. And he was not doing typical science, either. He obviously had considerable experience with computers, but I seriously doubt that, for example, work on his 1936 paper involved running into issues which were resolved by doing experiments. I would say agent foundations researchers have similarly considerable experience with agents.
(Another example/[proof of concept]/[existence proof of the reference class] is Einstein’s Arrogance.)
However, the reference class that includes the theory of computation is one possible reference class that might include the theory of agents.[1] But for all (I think) we know, the reference class we are in might also be (or look more like) complex systems studies, where you can prove a bunch of neat things, but there’s also a lot of behavior that is not computationally reducible and instead you need to observe, simulate, crunch the numbers. Moreover, noticing surprising real-world phenomena can serve as a guide to your attempts to explain the observed phenomena in ~mathematical terms (e.g., how West et al. explained (or re-derived) Kleiber’s law from the properties of intra-organismal resource supply networks[2]).
I don’t know what the theory will look like; to me, its shape remains an open a posteriori question.
Or whatever theory we need to understand agents as the theory that we need to understand agents need not be a theory of agents (but maybe something broader like IDK adaptivity or powerful optimization processes or maybe there’s a new ontology that cuts across our intuitive notion of agency and kinda dissolves it for the purpose of joint-carving understanding).
However, the reference class that includes the theory of computation is one possible reference class that might include the theory of agents.[1] But for all (I think) we know, the reference class we are in might also be (or look more like) complex systems studies, where you can prove a bunch of neat things, but there’s also a lot of behavior that is not computationally reducible and instead you need to observe, simulate, crunch the numbers. Moreover, noticing surprising real-world phenomena can serve as a guide to your attempts to explain the observed phenomena in ~mathematical terms (e.g., how West et al. explained (or re-derived) Kleiber’s law from the properties of intra-organismal resource supply networks[2]). I don’t know what the theory will look like; to me, its shape remains an open a posteriori question.
along an axis somewhat different than the main focus here, i think the right picture is: there is a rich field of thinking-studies. it’s like philosophy, math, or engineering. it includes eg Chomsky’s work on syntax, Turing’s work on computation, Gödel’s work on logic, Wittgenstein’s work on language, Darwin’s work on evolution, Hegel’s work on development, Pascal’s work on probability, and very many more past things and very many more still mostly hard-to-imagine future things. given this, i think asking about the character of a “theory of agents” would already soft-assume a wrong answer. i discuss this here
i guess a vibe i’m trying to communicate is: we already have thinking-studies in front of us, and so we can look at it and get a sense of what it’s like. of course, thinking-studies will develop in the future, but its development isn’t going to look like some sort of mysterious new final theory/science being created (though there will be methodological development (like for example the development of set-theoretic foundations in mathematics, or like the adoption of statistics in medical science), and many new crazy branches will be developed (of various characters), and we will surely ≈resolve various particular questions in various ways (though various other questions call for infinite investigations))
Thanks for writing this up! I strong-upvoted because, as you say, these ideas are not well-communicated, and this post contributes an explanation that I expect to be clarifying to a significant subset of people confused about agent foundations.
Initially, I wasn’t quite buying the claim that we don’t need any experiments (or more generally, additional empirics) to understand agency and all we need is to “just crunch” math and philosophy. The image I had in mind was something like “This theorem proves something non-trivial — or even significantly surprising — about a class of agents that includes humans, and we are in a position to verify it experimentally, so we should do it, to ensure that we’re not fooling ourselves.”.
Then, this passage made it click for me and I saw the possibility that maybe we are in a position where armchairs, whiteboards, tons of paper, and Lean code are sufficient.
(Another example/[proof of concept]/[existence proof of the reference class] is Einstein’s Arrogance.)
However, the reference class that includes the theory of computation is one possible reference class that might include the theory of agents.[1] But for all (I think) we know, the reference class we are in might also be (or look more like) complex systems studies, where you can prove a bunch of neat things, but there’s also a lot of behavior that is not computationally reducible and instead you need to observe, simulate, crunch the numbers. Moreover, noticing surprising real-world phenomena can serve as a guide to your attempts to explain the observed phenomena in ~mathematical terms (e.g., how West et al. explained (or re-derived) Kleiber’s law from the properties of intra-organismal resource supply networks[2]).
I don’t know what the theory will look like; to me, its shape remains an open a posteriori question.
Or whatever theory we need to understand agents as the theory that we need to understand agents need not be a theory of agents (but maybe something broader like IDK adaptivity or powerful optimization processes or maybe there’s a new ontology that cuts across our intuitive notion of agency and kinda dissolves it for the purpose of joint-carving understanding).
The explanation of their proof that I was able to understand is the one in this textbook.
along an axis somewhat different than the main focus here, i think the right picture is: there is a rich field of thinking-studies. it’s like philosophy, math, or engineering. it includes eg Chomsky’s work on syntax, Turing’s work on computation, Gödel’s work on logic, Wittgenstein’s work on language, Darwin’s work on evolution, Hegel’s work on development, Pascal’s work on probability, and very many more past things and very many more still mostly hard-to-imagine future things. given this, i think asking about the character of a “theory of agents” would already soft-assume a wrong answer. i discuss this here
i guess a vibe i’m trying to communicate is: we already have thinking-studies in front of us, and so we can look at it and get a sense of what it’s like. of course, thinking-studies will develop in the future, but its development isn’t going to look like some sort of mysterious new final theory/science being created (though there will be methodological development (like for example the development of set-theoretic foundations in mathematics, or like the adoption of statistics in medical science), and many new crazy branches will be developed (of various characters), and we will surely ≈resolve various particular questions in various ways (though various other questions call for infinite investigations))