[Question] Would this be Progress in Solving Embedded Agency?

Would it be progress if one could figure out how to construct an embedded system that can have a complete model of a highly compressible world, such that the system can correctly generate a plan that when executed would put the world into a particular target state (more simplifying assumptions follow)?

Correct planning means not dropping an anvil on its head as part of its plan, and being able to generate a plan that will include any beneficial self-modifications, by being able to “reason over itself”.

I am imagining a system that gets as its goal a target state of the world, that should be reached. The system generates a plan that when executed would reach the target. This plan is generated using a breath-first tree search.

I am making the following additional assumptions:

  • The world and the agent are both highly compressible. This means we can have a representation of the entire environment (including the agent) inside the agent, for some environments. We only concern ourselves with environments where this is the case.

  • To make tree search possible:

    • The environment is discrete.

    • You know the physics of the environment perfectly.

    • You know the current state of the world perfectly.

    • You can compute everything that takes finite compute and memory instantly. (This implies some sense of cartesianess, as I am sort of imagining the system running faster than the world, as it can just do an entire tree search in one “clock tick” of the environment.)

With these assumptions does the initial paragraph seem trivial to achieve or would it be considered progress?

My intuition is that this would still need to solve the problem of giving an agent a correct representation of itself, in the sense that it can “plan over itself” arbitrarily. This can be thought of as enabling the agent to reason over the entire environment which includes itself. Is that part a solved problem?

It also seems like you can think about a lot of memory optimizations in this setting. For example, you can save only one world model, and mutate it to do the tree search, by only saving the deltas at each node. Then the system could do a significant tree search if it has a total amount of memory of 2x the amount of memory required to represent the world model, assuming deltas are generally much smaller than the world model.

It seems like once you have solved these things you could get a working embedded system that is as smart as possible, in the sense that it would find the shortest plan that would result in the target world state.

This topic came up when working on a project where I try to make a set of minimal assumptions such that I know how to construct an aligned system under these assumptions. After knowing how to construct an aligned system under this set of assumptions, I then attempt to remove an assumption and adjust the system such that it is still aligned. I am trying to remove the cartesian assumption right now.

No answers.