RSS

Jeffrey Ladish

Karma: 1,830

Pal­isade is hiring Re­search Engineers

11 Nov 2023 3:09 UTC
22 points
0 comments3 min readLW link

unRLHF—Effi­ciently un­do­ing LLM safeguards

12 Oct 2023 19:58 UTC
116 points
15 comments20 min readLW link

LoRA Fine-tun­ing Effi­ciently Un­does Safety Train­ing from Llama 2-Chat 70B

12 Oct 2023 19:58 UTC
147 points
29 comments14 min readLW link

The Agency Overhang

Jeffrey Ladish21 Apr 2023 7:47 UTC
81 points
6 comments6 min readLW link

Dona­tion offsets for ChatGPT Plus subscriptions

Jeffrey Ladish16 Mar 2023 23:29 UTC
53 points
3 comments3 min readLW link

To de­ter­mine al­ign­ment difficulty, we need to know the ab­solute difficulty of al­ign­ment generalization

Jeffrey Ladish14 Mar 2023 3:52 UTC
12 points
3 comments2 min readLW link

Thoughts on the OpenAI al­ign­ment plan: will AI re­search as­sis­tants be net-pos­i­tive for AI ex­is­ten­tial risk?

Jeffrey Ladish10 Mar 2023 8:21 UTC
58 points
3 comments9 min readLW link

AGI sys­tems & hu­mans will both need to solve the al­ign­ment problem

Jeffrey Ladish24 Feb 2023 3:29 UTC
59 points
14 comments4 min readLW link