Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Maheep Chaudhary
Karma:
30
All
Posts
Comments
New
Top
Old
Awareness Jailbreaking: Revealing True Alignment in Evaluation-Aware Models
Maheep Chaudhary
29 Dec 2025 21:29 UTC
10
points
0
comments
4
min read
LW
link
Evaluation Awareness Scales Predictably in Open-Weights Large Language Models
Maheep Chaudhary
19 Dec 2025 2:47 UTC
21
points
0
comments
6
min read
LW
link
Back to top