Noosphere89 comments on What Is The Alignment Problem?

Noosphere89 16 Jan 2025 15:40 UTC
4 points
0
- How do we carve out “the system” from “the environment”, i.e. how do we draw a Cartesian boundary, in order to roughly match human instincts like “looks like it’s robustly optimizing for X”? That’s an open question, and probably a special case of the more general question of how humans abstract out subsystems from their environment. (This was actually relevant in the previous section too, but it’s more apparent once agency is introduced.)
Another interesting question is “how can we derive a theory of agency that is consistent and derives answers even when such a boundary can be arbitrarily shifted?”
For example, quantum mechanics has exactly this issue, with the natural question of “who measures the measurement apparatus”, and the solution in quantum mechanics is to declare that the boundary is arbitrary, and that the description must remain consistent even if the boundary is shifted arbitrarily.
Indeed, physically universal cellular automatons, proved to exist by Luke Schaeffer, treats the controller (in this case agent) as the same type of physical system as the system to be controlled (in this case environment), without any boundary, or at best a boundary that can be shifted arbitrarily, which makes them potential foundations for a more workable theory of agency than the old dualistic view (for our universe).
More here:
https://arxiv.org/abs/1009.1720
https://eccc.weizmann.ac.il//report/2014/084/
https://arxiv.org/abs/1501.03988
What links here?
- Noosphere89's comment on What Is The Alignment Problem? by johnswentworth (20 Jan 2025 1:32 UTC; 2 points)