RSS

ChrisCundy

Karma: 70

Avoid­ing AI De­cep­tion: Lie De­tec­tors can ei­ther In­duce Hon­esty or Evasion

5 Jun 2025 23:07 UTC
22 points
2 comments5 min readLW link
(far.ai)

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

7 Feb 2025 3:57 UTC
37 points
0 comments10 min readLW link