(Nit: This paper didn’t originally coin the term “alignment faking.” I first learned of the term (which I then passed on to the other co-authors) from Joe Carlsmith’s report Scheming AIs: Will AIs fake alignment during training in order to get power?)
(Nit: This paper didn’t originally coin the term “alignment faking.” I first learned of the term (which I then passed on to the other co-authors) from Joe Carlsmith’s report Scheming AIs: Will AIs fake alignment during training in order to get power?)