Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
AndresCampero
Karma:
24
All
Posts
Comments
New
Top
Old
Quickly Assessing Reward Hacking-like Behavior in LLMs and its Sensitivity to Prompt Variations
AndresCampero
4 Jun 2025 7:22 UTC
26
points
1
comment
17
min read
LW
link
Back to top