RSS

Zhijing Jin

Karma: 53

Ex­plain­ing un­de­sir­able model be­hav­ior: (How) can in­fluence func­tions help?

2 Mar 2026 11:30 UTC
18 points
0 comments3 min readLW link

The Multi-Agent Minefield: Can LLMs Co­op­er­ate to Avoid Global Catas­tro­phe?

17 Feb 2026 16:55 UTC
14 points
2 comments5 min readLW link

Test­ing the Author­i­tar­ian Bias of LLMs

9 Aug 2025 18:09 UTC
10 points
1 comment6 min readLW link

Why Rea­son­ing Isn’t Enough: How LLM Agents Strug­gle with Ethics and Cooperation

28 Jun 2025 20:43 UTC
6 points
0 comments4 min readLW link

In­ves­ti­gat­ing Ac­ci­den­tal Misal­ign­ment: Causal Effects of Fine-Tun­ing Data on Model Vulnerability

11 Jun 2025 19:30 UTC
6 points
0 comments5 min readLW link

Cor­rupted by Rea­son­ing: Rea­son­ing Lan­guage Models Be­come Free-Riders in Public Goods Games

22 Apr 2025 19:25 UTC
24 points
3 comments5 min readLW link

Wel­come to Ap­ply: The 2024 Vi­talik Bu­terin Fel­low­ships in AI Ex­is­ten­tial Safety by FLI!

Zhijing Jin25 Sep 2023 18:42 UTC
5 points
2 comments2 min readLW link