Thank you so much for the excellent and insightful post on mechanistic models, Evan!
My hypothesis is that the difficulty of finding mechanistic models that consistently make accurate predictions is likely due to the agent-environment system’s complexity and computational irreducibility. Such agent-environment interactions may be inherently unpredictable “because of the difficulty of pre-stating the relevant features of ecological niches, the complexity of ecological systems and [the fact that the agent-ecology interaction] can enable its own novel system states.”
Suppose that one wants to consistently make accurate predictions about a computationally irreducible agent-environment system. In general, the most efficient way to do so is to run the agent in the given environment. There are probably no shortcuts, even via mechanistic models.
For dangerous AI agents, an accurate simulation box of the deployment environment would be ideal for safe empiricism. This is probably intractable for many use cases of AI agents, but computational irreducibility implies that methods other than empiricism are probably even more intractable.
Please read my post “The limited upside of interpretability” for a detailed argument. It would be great to hear your thoughts!
Thank you so much for sharing this extremely insightful argument, Evan! I really appreciate hearing your detailed thoughts on this.
I’ve been grappling with the pros and cons of an atheoretical-empirics-based approach (in your language, “behavior”) and a theory-based approach (in your language, “understanding”) within the complex sciences, such as but not limited to AI. My current thought is that unfortunately, both of the following are true:
1) Findings based on atheoretical empirics are susceptible to being brittle, in that it is unclear whether or in precisely which settings these findings will replicate. (e.g., see “A problem in theory” by Michael Muthukrishna and Joe Henrich: https://www.nature.com/articles/s41562-018-0522-1)
2) While theoretical models enable one to meaningfully attempt predictions that extrapolate outside of the empirical sample, these models can always fail, especially in the complex sciences. “There is no such thing as a validated predictive model” (https://bmcmedicine.biomedcentral.com/articles/10.1186/s12916-023-02779-w).
A common difficulty that theory-based predictions out-of-distribution run into is the tradeoff between precision and generality. Levin (https://www.jstor.org/stable/27836590) described this idea by saying that among three desirable properties—generality, precision, and realism—a theory can only simultaneously achieve two. The following is Levin’s triangle: