Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Mads U
Karma:
0
All
Posts
Comments
New
Top
Old
Mads U
16 Nov 2025 17:36 UTC
1
point
0
on:
Sonnet 4.5′s eval gaming seriously undermines alignment evals, and this seems caused by training on alignment evals
Does this mean that the model will always behave nicely, if it always thinks it is being tested?
Back to top
Does this mean that the model will always behave nicely, if it always thinks it is being tested?