gwern comments on Auditing language models for hidden objectives