One big element of the dangers of unaligned AI is that it acts as a coherent entity, an agent that has agency and can do things. We could try to remove this property from the models, for example, by gradient rooting and ablating. But agents are useful. We want to give the LM tasks that it executes on our behalf. Can we give tasks to them without them being a coherent unit that has potential goals of its own? All right Think it should be possible to shape the model in a way that it has a reduced form of agency. what forms could this agency take?
Oracle—an oracle that knows and predicts but doesn’t have identity or goals
Delegate—acting without own identity but modeling the identity of the user
Tool/Service/Automation—running a standardized process across all users without “being” that process
One big element of the dangers of unaligned AI is that it acts as a coherent entity, an agent that has agency and can do things. We could try to remove this property from the models, for example, by gradient rooting and ablating. But agents are useful. We want to give the LM tasks that it executes on our behalf. Can we give tasks to them without them being a coherent unit that has potential goals of its own? All right Think it should be possible to shape the model in a way that it has a reduced form of agency. what forms could this agency take?
Oracle—an oracle that knows and predicts but doesn’t have identity or goals
Delegate—acting without own identity but modeling the identity of the user
Tool/Service/Automation—running a standardized process across all users without “being” that process