RSS

Alek Westover

Karma: 224

How will we do SFT on mod­els with opaque rea­son­ing?

21 Feb 2026 0:00 UTC
32 points
17 comments7 min readLW link

Three vi­sions for diffuse control

Alek Westover9 Feb 2026 6:41 UTC
4 points
0 comments3 min readLW link

The­o­ret­i­cal pre­dic­tions on the sam­ple effi­ciency of train­ing poli­cies and ac­ti­va­tion monitors

10 Jan 2026 23:50 UTC
18 points
2 comments7 min readLW link

Four Down­sides of Train­ing Poli­cies Online

4 Jan 2026 3:17 UTC
29 points
4 comments3 min readLW link

Method­olog­i­cal con­sid­er­a­tions in mak­ing ma­lign ini­tial­iza­tions for con­trol research

24 Dec 2025 1:18 UTC
11 points
0 comments13 min readLW link

Notes on Soft­ware-Based Com­pute-Usage Verification

Alek Westover15 Dec 2025 3:40 UTC
8 points
0 comments12 min readLW link

Alek Westover’s Shortform

Alek Westover8 Dec 2025 4:24 UTC
2 points
17 comments1 min readLW link

Should AI Devel­op­ers Re­move Dis­cus­sion of AI Misal­ign­ment from AI Train­ing Data?

Alek Westover23 Oct 2025 15:12 UTC
51 points
3 comments9 min readLW link

What train­ing data should de­vel­op­ers filter to re­duce risk from mis­al­igned AI? An ini­tial nar­row proposal

Alek Westover17 Sep 2025 15:30 UTC
44 points
4 comments18 min readLW link

Why I think AI will go poorly for humanity

Alek Westover19 Mar 2025 15:52 UTC
14 points
0 comments30 min readLW link

Safe Distil­la­tion With a Pow­er­ful Un­trusted AI

Alek Westover20 Feb 2025 3:14 UTC
5 points
1 comment5 min readLW link