Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Yavuz Bakman
Karma:
30
Thinking about AI Alignment and Reliability.
Enjoying Soulsborne games.
All
Posts
Comments
New
Top
Old
LLM Misalignment Can be One Gradient Step Away, and Blackbox Evaluation Cannot Detect It.
Yavuz Bakman
15 Mar 2026 0:19 UTC
31
points
4
comments
3
min read
LW
link
Back to top