Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
ariana_azarbal
Karma:
405
All
Posts
Comments
New
Top
Old
Confusion around the term reward hacking
ariana_azarbal
20 Mar 2026 16:13 UTC
47
points
5
comments
5
min read
LW
link
Recontextualization Mitigates Specification Gaming Without Modifying the Specification
ariana_azarbal
,
Victor Gillioz
,
TurnTrout
and
cloud
14 Oct 2025 0:53 UTC
144
points
15
comments
10
min read
LW
link
Training a Reward Hacker Despite Perfect Labels
ariana_azarbal
,
Victor Gillioz
and
TurnTrout
14 Aug 2025 23:57 UTC
139
points
47
comments
4
min read
LW
link
Selective Generalization: Improving Capabilities While Maintaining Alignment
ariana_azarbal
,
Matthew A. Clarke
,
Jorio Cocola
,
Cailley Factor
and
cloud
16 Jul 2025 21:25 UTC
75
points
6
comments
7
min read
LW
link
Back to top