A team at Google has substantially advanced the theory of embedded agency with a grain of truth (GOT), including new developments on reflective oracles and an interesting alternative construction (the “Reflective Universal Inductor” or RUI).
(I was not involved in this work)
Abstract:
The standard theory of model-free reinforcement learning assumes that the environment dynamics are stationary and that agents are decoupled from their environment, such that policies are treated as being separate from the world they inhabit. This leads to theoretical challenges in the multi-agent setting where the non-stationarity induced by the learning of other agents demands prospective learning based on prediction models. To accurately model other agents, an agent must account for the fact that those other agents are, in turn, forming beliefs about it to predict its future behavior, motivating agents to model themselves as part of the environment. Here, building upon foundational work on universal artificial intelligence (AIXI), we introduce a mathematical framework for prospective learning and embedded agency centered on self-prediction, where Bayesian RL agents predict both future perceptual inputs and their own actions, and must therefore resolve epistemic uncertainty about themselves as part of the universe they inhabit. We show that in multi-agent settings, self-prediction enables agents to reason about others running similar algorithms, leading to new game-theoretic solution concepts and novel forms of cooperation unattainable by classical decoupled agents. Moreover, we extend the theory of AIXI, and study universally intelligent embedded agents which start from a Solomonoff prior. We show that these idealized agents can form consistent mutual predictions and achieve infinite-order theory of mind, potentially setting a gold standard for embedded multi-agent learning.
Wake up babe, new decision theory just dropped!
Yes, it seems to be closer UDT, but… updateful. So not that close to UDT. Really, it’s “just” an mathematically rigorous, embedded EDT.
Could we then say that MUPI obtains acausal coordination from a causal decision theory? This has been suggested a few times in the history of Less Wrong.
Evidential decision theory allows acausal coordination.
Abstract for those who want to see it without clicking on the link:
Thanks, good suggestion. I copied this up into the body of the post.
This looks exciting. As Jeremy said, the length raises an eyebrow.
I’ve invited the authors to present at the AIXI research meetings (uaiasi.com). It will probably take two presentations. I will advertise here (and other places you will see) once we have dates.
Tentatively 2 pm ET on Monday December 15th at the usual zoom link: https://uwaterloo.zoom.us/j/7921763961?pwd=TDatET6CBu47o4TxyNn9ccL2Ia8HN4.1
Check the calendar for any updates: https://uaiasi.com
It will be one 90-120 minute presentation.
This seems like it’s building on or inspired by work you’ve done? Or was this team interested in embeddedness and reflective oracles for other reasons?
It’s ridiculously long (which is great, I’ll read through it when I get a chance), do you have any pointers to sections that you think have particularly valuable insights?
I believe they were mainly inspired by Demski and Garrabrant, but we were in contact for the last few months and I’m glad to see that some of my recent work was applicable. We arrived at the idea of using a joint distribution with a grain of truth independently, and they introduce a novel “RUI” construction, but also study (what I’ve been calling) AEDT wrt rOSI in section 5.2. The differences are pretty technical, IMO the RUI approach is halfway between rOSI and logical induction.
It’s so long that even I’m still reading it, and I got a copy early. Assuming you’re familiar with Solomonoff induction / AIXI / embedded agency (which it sounds like you are) the core of it is section 3 and section 5 (particularly 5.1-5.3 I think). The appendix is like 100 pages and so far doesn’t seem essential unless you want to extend the results (also some of it will be familiar if you read my GOT paper).
Author here. We were heavily inspired by multiple things, including Demski and Garrabrant, the 1990′s work of Kalai and Lehrer, empirical work in our group inspired by neuroscience pointing towards systems that predict their own actions, and the earlier work on reflective oracles by Leike . We were not aware of @Cole Wyeth et al.’s excellent 2025 paper which puts the reflective oracle work on firmer theoretical footing, as our work was (largely but not entirely) done before this paper appeared.
Hey Jeremy! Our team’s interest is mainly on multi-agent learning, and self-modeling and theory of mind. As properly formalizing a coherent theory for these topics turned out to be quite difficult, we dived deeper and deeper and ultimately arrived at the AIXI and reflective oracles frameworks, which provided a good set of tools as starting points for addressing these questions more formally. The resulting ‘monster paper’ is a write-up of the past year of work we did on these topics. Due to our interest in multi-agent learning, a good chunk of the paper is on the game-theoretic behavior of such ‘embedded Bayesian agents’ (Section 4). As Cole mentioned, we arrived independently to some similar results as Cole’s (as we came from a bit outside of the less wrong community), and we are very excited to now start collaborating more closely with Cole on the next questions enabled by both of our theories!