Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Nathan Hu
Karma:
154
All
Posts
Comments
New
Top
Old
[Linkpost] Interpreting Language Model Parameters
Lucius Bushnaq
,
Dan Braun
,
Oliver Clive-Griffin
,
Bart Bussmann
,
Nathan Hu
,
mivanitskiy
,
Linda Linsefors
and
Lee Sharkey
5 May 2026 17:37 UTC
162
points
2
comments
2
min read
LW
link
(www.goodfire.ai)
Training on Documents About Reward Hacking Induces Reward Hacking
evhub
and
Nathan Hu
21 Jan 2025 21:32 UTC
135
points
15
comments
2
min read
LW
link
(alignment.anthropic.com)
Back to top