RSS

TerryJCZhang

Karma: 26

Make Smarter AI Safer

Ex­plain­ing un­de­sir­able model be­hav­ior: (How) can in­fluence func­tions help?

2 Mar 2026 11:30 UTC
18 points
0 comments3 min readLW link

The Multi-Agent Minefield: Can LLMs Co­op­er­ate to Avoid Global Catas­tro­phe?

17 Feb 2026 16:55 UTC
14 points
2 comments5 min readLW link

Repli­ca­tion of Koorndijk (2025): Differ­en­tial Com­pli­ance May Reflect Prompt Sen­si­tivity Rather Than Strate­gic Reasoning

13 Feb 2026 16:12 UTC
2 points
0 comments8 min readLW link