Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Alek Westover
Karma:
224
All
Posts
Comments
New
Top
Old
How will we do SFT on models with opaque reasoning?
Alek Westover
,
Vivek Hebbar
and
egan
21 Feb 2026 0:00 UTC
32
points
17
comments
7
min read
LW
link
Three visions for diffuse control
Alek Westover
9 Feb 2026 6:41 UTC
4
points
0
comments
3
min read
LW
link
Theoretical predictions on the sample efficiency of training policies and activation monitors
Alek Westover
and
Vivek Hebbar
10 Jan 2026 23:50 UTC
18
points
2
comments
7
min read
LW
link
Four Downsides of Training Policies Online
Alek Westover
and
egan
4 Jan 2026 3:17 UTC
29
points
4
comments
3
min read
LW
link
Methodological considerations in making malign initializations for control research
Alek Westover
,
Vivek Hebbar
and
Julian Stastny
24 Dec 2025 1:18 UTC
11
points
0
comments
13
min read
LW
link
Notes on Software-Based Compute-Usage Verification
Alek Westover
15 Dec 2025 3:40 UTC
8
points
0
comments
12
min read
LW
link
Alek Westover’s Shortform
Alek Westover
8 Dec 2025 4:24 UTC
2
points
17
comments
1
min read
LW
link
Should AI Developers Remove Discussion of AI Misalignment from AI Training Data?
Alek Westover
23 Oct 2025 15:12 UTC
51
points
3
comments
9
min read
LW
link
What training data should developers filter to reduce risk from misaligned AI? An initial narrow proposal
Alek Westover
17 Sep 2025 15:30 UTC
44
points
4
comments
18
min read
LW
link
Why I think AI will go poorly for humanity
Alek Westover
19 Mar 2025 15:52 UTC
14
points
0
comments
30
min read
LW
link
Safe Distillation With a Powerful Untrusted AI
Alek Westover
20 Feb 2025 3:14 UTC
5
points
1
comment
5
min read
LW
link
Back to top