<@Embedded agency problems@>(@Embedded Agents@) are a class of theoretical problems that arise as soon as an agent is part of the environment it is interacting with and modeling, rather than having a clearly-defined and separated relationship. This post makes the argument that before we can solve embedded agency problems, we first need to develop a theory of _abstraction_. _Abstraction_ refers to the problem of throwing out some information about a system while still being able to make predictions about it. This problem can also be referred to as the problem of constructing a map for some territory.
The post argues that abstraction is key for embedded agency problems because the underlying challenge of embedded world models is that the agent (the map) is smaller than the environment it is modeling (the territory), and so inherently has to throw some information away.
Some simple questions around abstraction that we might want to answer include: - Given a map-making process, characterize the queries whose answers the map can reliably predict. - Given some representation of the map-territory correspondence, translate queries from the territory-representation to the map-representation and vice versa. - Given a territory, characterize classes of queries which can be reliably answered using a map much smaller than the territory itself. - Given a territory and a class of queries, construct a map which throws out as much information as possible while still allowing accurate prediction over the query class.
The post argues that once we create the simple theory, we will have a natural way of looking at more challenging problems with embedded agency, like the problem of self-referential maps, the problem of other map-makers, and the problem of self-reasoning that arises when the produced map includes an abstraction of the map-making process itself.
Asya’s opinion:
My impression is that embedded agency problems as a class of problems are very young, extremely entangled, and characterized by a lot of confusion. I am enthusiastic about attempts to decrease confusion and intuitively, abstraction does feel like a key component to doing that.
That being said, my guess is that it’s difficult to predictably suggest the most promising research directions in a space that’s so entangled. For example, one thread in the comments of this post discusses the fact that this theory of abstraction as presented looks at “one-shot” agency where the system takes in some data once and then outputs it, rather than “dynamic” agency where a system takes in data and outputs decisions repeatedly over time. Abram Demski argues that the “dynamic” nature of embedded agency is a central part of the problem and that it may be more valuable and neglected to put research emphasis there.
Asya’s summary for the Alignment Newsletter:
Asya’s opinion:
LGTM