Joe_Collman comments on Deceptive AI ≠ Deceptively-aligned AI

Joe_Collman 9 Jan 2024 16:23 UTC
LW: 4 AF: 3
0
AF
…but the AI is actually emitting those outputs in order to create that impression—more specifically, the AI has situational awareness
I think it’s best to avoid going beyond the RFLO description.
In particular, it is not strictly required that the AI be aiming to “create that impression”, or that it has “situational awareness” in any strong/general sense.
Per footnote 26 in RFLO (footnote 7 in the post):
”Note that it is not required that the mesa-optimizer be able to model (or infer the existence of) the base optimizer; it only needs to model the optimization pressure it is subject to.”
It needs to be:
Modeling the optimization pressure.
Adapting its responses to that optimization pressure.
Saying more than that risks confusion and overly narrow approaches.
By all means use things like “in order to create that impression” in an example. It shouldn’t be in the definition.