The last of these seems very unlike the others? We have a list of potential alignment failures, bypassing safeguards, etc...and then one time when the model carelessly deleted the wrong thing. Is the researcher still really mad at Claude about that or something?
The last of these seems very unlike the others? We have a list of potential alignment failures, bypassing safeguards, etc...and then one time when the model carelessly deleted the wrong thing. Is the researcher still really mad at Claude about that or something?