Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Mia Taylor
Karma:
137
All
Posts
Comments
New
Top
Old
Harmless reward hacks can generalize to misalignment in LLMs
Mia Taylor
and
Owain_Evans
26 Aug 2025 17:32 UTC
46
points
6
comments
7
min read
LW
link
Model Organisms for Emergent Misalignment
Anna Soligo
,
Edward Turner
,
Mia Taylor
,
Senthooran Rajamanoharan
and
Neel Nanda
16 Jun 2025 15:46 UTC
110
points
15
comments
5
min read
LW
link
Back to top