Jeffrey Ladish

Karma: 1,859

Take SCIFs, it’s dan­ger­ous to go alone

1 May 2024 8:02 UTC
35 points
1 comment3 min readLW link

Pal­isade is hiring Re­search Engineers

11 Nov 2023 3:09 UTC
23 points
0 comments3 min readLW link

unRLHF—Effi­ciently un­do­ing LLM safeguards

12 Oct 2023 19:58 UTC
117 points
15 comments20 min readLW link

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
148 points
29 comments14 min readLW link

The Agency Overhang

Jeffrey Ladish21 Apr 2023 7:47 UTC
81 points
6 comments6 min readLW link

Dona­tion offsets for ChatGPT Plus subscriptions

Jeffrey Ladish16 Mar 2023 23:29 UTC
53 points
3 comments3 min readLW link

To de­ter­mine al­ign­ment difficulty, we need to know the ab­solute difficulty of al­ign­ment generalization

Jeffrey Ladish14 Mar 2023 3:52 UTC
12 points
3 comments2 min readLW link

Thoughts on the OpenAI al­ign­ment plan: will AI re­search as­sis­tants be net-pos­i­tive for AI ex­is­ten­tial risk?

Jeffrey Ladish10 Mar 2023 8:21 UTC
58 points
3 comments9 min readLW link