I am having difficulty seeing why anyone would regard these two viewpoints as opposed.
We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator. Yes, the concepts overlap, but there is nonetheless a kind of tension between them. In the case of agent vs. simulator, our central question is: which property is “driving the bus” with respect to the system’s behavior, utilizing the other in its service?
The second post explores the implications of the above distinction, predicting different types of values—and thus behavior—from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins. Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.
We discuss this indirectly in the first post in this sequence outlining what it means to describe a system through the lens of an agent, tool, or simulator. Yes, the concepts overlap, but there is nonetheless a kind of tension between them. In the case of agent vs. simulator, our central question is: which property is “driving the bus” with respect to the system’s behavior, utilizing the other in its service?
The second post explores the implications of the above distinction, predicting different types of values—and thus behavior—from an agent that contains a simulation of the world and uses it to navigate vs. a simulator that generates agents because such agents are part of the environment the system is modelling vs. a system where the modes are so entangled it is meaningless to even talk about where one ends and the other begins. Specifically, I would expect simulator-first systems to have wide value boundaries that internalize (and approximation of) human values, but more narrow, maximizing behavior from agent-first systems.