Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
TerryJCZhang
Karma:
26
Make Smarter AI Safer
All
Posts
Comments
New
Top
Old
Explaining undesirable model behavior: (How) can influence functions help?
Zhijing Jin
,
TerryJCZhang
and
Punya Syon Pandey
2 Mar 2026 11:30 UTC
18
points
0
comments
3
min read
LW
link
The Multi-Agent Minefield: Can LLMs Cooperate to Avoid Global Catastrophe?
Zhijing Jin
,
Thao Amelia Pham
,
TerryJCZhang
,
pepijn_cobben
,
Angelo Huang
,
Isabel Dahlgren
and
Jacob Brinton
17 Feb 2026 16:55 UTC
14
points
2
comments
5
min read
LW
link
Replication of Koorndijk (2025): Differential Compliance May Reflect Prompt Sensitivity Rather Than Strategic Reasoning
Chijioke Ugwuanyi
and
TerryJCZhang
13 Feb 2026 16:12 UTC
2
points
0
comments
8
min read
LW
link
Back to top