I think this is a useful taxonomy.
The disparity between L2 and L3 is closely related to mesa-optimization.
The problem that “every agent trivially optimises for being the agent that they are” can be somewhat ameliorated by constraining ourselves to instrumental reward functions.
I think this is a useful taxonomy.
The disparity between L2 and L3 is closely related to mesa-optimization.
The problem that “every agent trivially optimises for being the agent that they are” can be somewhat ameliorated by constraining ourselves to instrumental reward functions.