Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
lennie
Karma:
53
All
Posts
Comments
New
Top
Old
Whack-a-mole: generalisation resistance could be facilitated by training-distribution imprintation
lennie
13 Dec 2025 17:46 UTC
23
points
0
comments
14
min read
LW
link
Which differences between sandbagging evaluations and sandbagging safety research are important for control?
lennie
6 Oct 2025 18:20 UTC
6
points
0
comments
11
min read
LW
link
Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions.
lennie
6 Oct 2025 14:00 UTC
8
points
0
comments
8
min read
LW
link
[Question]
Feedback request: Is the time right for an AI Safety stack exchange?
lennie
26 Sep 2025 9:14 UTC
22
points
0
comments
4
min read
LW
link
Back to top