My recommendation is to, following Joe Carlsmith, use them as synonyms, and use the term “schemer” instead of “deceptively aligned model”. I do this.
Joe’s issues with the term “deceptive alignment”:
I think that the term “deceptive alignment” often leads to confusion between the four sorts of deception listed above. And also: if the training signal is faulty, then “deceptively aligned” models need not be behaving in aligned ways even during training (that is, “training gaming” behavior isn’t always “aligned” behavior).
My recommendation is to, following Joe Carlsmith, use them as synonyms, and use the term “schemer” instead of “deceptively aligned model”. I do this.
Joe’s issues with the term “deceptive alignment”: