RSS

Simon Lermen

Karma: 396

Creat­ing un­re­stricted AI Agents with Com­mand R+

Simon Lermen16 Apr 2024 14:52 UTC
69 points
12 comments5 min readLW link

unRLHF—Effi­ciently un­do­ing LLM safeguards

12 Oct 2023 19:58 UTC
117 points
15 comments20 min readLW link

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
148 points
29 comments14 min readLW link

Ro­bust­ness of Model-Graded Eval­u­a­tions and Au­to­mated Interpretability

15 Jul 2023 19:12 UTC
44 points
5 comments9 min readLW link