RSS

Brendan Murphy

Karma: 87

Illu­sory Safety: Redteam­ing Deep­Seek R1 and the Strongest Fine-Tun­able Models of OpenAI, An­thropic, and Google

7 Feb 2025 3:57 UTC
37 points
0 comments10 min readLW link

On Tar­geted Ma­nipu­la­tion and De­cep­tion when Op­ti­miz­ing LLMs for User Feedback

7 Nov 2024 15:39 UTC
51 points
7 comments11 min readLW link

GPT-4o Guardrails Gone: Data Poi­son­ing & Jailbreak-Tuning

1 Nov 2024 0:10 UTC
18 points
0 comments6 min readLW link
(far.ai)