This left me wanting to read more about how you think your alignment-faking paper falls short of the desiderata described here, as a concrete example. You link it as an example of a scheming analog:
Research studying current AIs that are intended to be analogous to schemers won’t yield compelling empirical evidence or allow us to productively study approaches for eliminating scheming, as these AIs aren’t very analogous in practice.
IIUC, that research used an actual production model but used contrived inputs to elicit alignment-faking behavior. Why is that worse / less “real” than e.g. using rigged training processes?
You also say:
The main alternatives to studying actual schemers are to study AIs that are trained to behave like a schemer might behave, or to study abstract analogies of scheming.
It’s not clear to me which of these categories the alignment-faking paper falls into, and I’d find it useful to hear how you think it displays the downsides you suggest for these categories (e.g. being insufficiently analogous to actually dangerous scheming).
One potential argument in favor of expecting large-particle transmission is, “Colds make people cough and sneeze. Isn’t it likely that the most common infection that causes coughs and sneezes would also be one that spreads easily via coughs and sneezes?”
But I’m not sure how much evidence that really provides. I’m curious what others think.