RSS

Alex Mallen

Karma: 2,324

Redwood Research

Which goals ac­tu­ally mo­ti­vate de­cep­tive al­ign­ment?

19 May 2026 21:53 UTC
25 points
0 comments10 min readLW link

In­crim­i­nat­ing mis­al­igned AI mod­els via distillation

15 May 2026 21:43 UTC
115 points
12 comments5 min readLW link

Risk re­ports need to ad­dress de­ploy­ment-time spread of misalignment

Alex Mallen15 May 2026 18:20 UTC
64 points
1 comment5 min readLW link

Clar­ify­ing the role of the be­hav­ioral se­lec­tion model

Alex Mallen10 May 2026 19:41 UTC
17 points
0 comments4 min readLW link