Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
aditya singh
Karma:
102
All
Posts
Comments
New
Top
Old
How to Design Environments for Understanding Model Motives
gersonkroiz
,
aditya singh
,
Senthooran Rajamanoharan
and
Neel Nanda
2 Mar 2026 7:14 UTC
42
points
0
comments
10
min read
LW
link
Why Did My Model Do That? Model Incrimination for Diagnosing LLM Misbehavior
aditya singh
,
gersonkroiz
,
Senthooran Rajamanoharan
and
Neel Nanda
27 Feb 2026 3:20 UTC
51
points
1
comment
78
min read
LW
link
Principled Interpretability of Reward Hacking in Closed Frontier Models
gersonkroiz
,
aditya singh
,
Senthooran Rajamanoharan
and
Neel Nanda
1 Jan 2026 16:37 UTC
24
points
0
comments
23
min read
LW
link
Back to top