Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
dwk
Karma:
55
All
Posts
Comments
New
Top
Old
What Drives the Compliance Gap? A Three-Driver Decomposition of Alignment Faking
Nathaniel Mitrani
,
Rhea Karty
,
dwk
and
Alan Cooney
28 May 2026 10:50 UTC
21
points
0
comments
8
min read
LW
link
(arxiv.org)
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio
,
Jesse Richardson
,
dwk
and
mattmacdermott
24 Feb 2025 18:31 UTC
45
points
15
comments
11
min read
LW
link
Back to top