RSS

ChrisCundy

Karma: 70

Avoid­ing AI De­cep­tion: Lie De­tec­tors can ei­ther In­duce Hon­esty or Evasion

Jun 5, 2025, 11:07 PM
22 points
2 comments5 min readLW link
(far.ai)

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

Feb 7, 2025, 3:57 AM
37 points
0 comments10 min readLW link