Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Isaac Dunn
Karma:
89
All
Posts
Comments
New
Top
Old
Appendices: Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking
Isaac Dunn
,
Kei Nishimura-Gasparian
,
Carson Denison
,
Ethan Perez
and
Robert Kirk
22 Dec 2025 19:33 UTC
17
points
0
comments
1
min read
LW
link
Supervised finetuning on low-harm reward hacking generalises to high-harm reward hacking
Isaac Dunn
,
Kei Nishimura-Gasparian
,
Carson Denison
,
Ethan Perez
and
Robert Kirk
22 Dec 2025 19:32 UTC
15
points
0
comments
30
min read
LW
link
Reward hacking behavior can generalize across tasks
Kei Nishimura-Gasparian
,
Isaac Dunn
,
Henry Sleight
,
Miles Turpin
,
evhub
,
Carson Denison
and
Ethan Perez
28 May 2024 16:33 UTC
86
points
5
comments
21
min read
LW
link
Back to top