RSS

Alec Harris

Karma: 118

A mis­al­ign­ment taxonomy

Alec Harris21 Jun 2026 10:20 UTC
13 points
2 comments3 min readLW link

Power-seek­ing agents will likely be developed

Alec Harris20 May 2026 9:26 UTC
42 points
0 comments4 min readLW link

Teach­ing Models to Dream of Bet­ter Mon­i­tors through Eval­u­a­tion Con­di­tioned Training

19 Mar 2026 21:01 UTC
49 points
2 comments10 min readLW link