Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Cody Rushing
Karma:
583
All
Posts
Comments
New
Top
Old
Powerful misaligned AIs may be extremely persuasive, especially absent mitigations
Cody Rushing
16 Jan 2026 8:08 UTC
68
points
5
comments
14
min read
LW
link
Factored Cognition Strengthens Monitoring and Thwarts Attacks
Aaron Sandoval
18 Jun 2025 18:28 UTC
29
points
0
comments
25
min read
LW
link
Ctrl-Z: Controlling AI Agents via Resampling
Aryan Bhatt
,
Buck
,
Adam Kaufman
and
Tyler Tracy
16 Apr 2025 16:21 UTC
126
points
0
comments
20
min read
LW
link
[Paper] All’s Fair In Love And Love: Copy Suppression in GPT-2 Small
CallumMcDougall
,
Arthur Conmy
,
Tom McGrath
and
Neel Nanda
13 Oct 2023 18:32 UTC
82
points
4
comments
8
min read
LW
link
Back to top