Zhijing Jin

Karma: 27

Testing the Authoritarian Bias of LLMs

Zhijing Jin, Irene Strauss, David Guzman Piedrahita and Keenan Samway

9 Aug 2025 18:09 UTC

10 points

1 comment6 min readLW link

Why Reasoning Isn’t Enough: How LLM Agents Struggle with Ethics and Cooperation

Zhijing Jin, David Guzman Piedrahita, Yongjin Yang and Steffen Backmann

28 Jun 2025 20:43 UTC

6 points

0 comments4 min readLW link

Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability

Zhijing Jin, Punya Syon Pandey, samuelsimko and Kellin Pelrine

11 Jun 2025 19:30 UTC

6 points

0 comments5 min readLW link

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang and Zhijing Jin

22 Apr 2025 19:25 UTC

24 points

3 comments5 min readLW link

Zhijing Jin 27 Sep 2023 16:10 UTC
1 point
0
in reply to: Mo Putera’s comment on: Welcome to Apply: The 2024 Vitalik Buterin Fellowships in AI Existential Safety by FLI!
Thank you for spotting it! I just did the fix :).

Welcome to Apply: The 2024 Vitalik Buterin Fellowships in AI Existential Safety by FLI!

Zhijing Jin25 Sep 2023 18:42 UTC

5 points

2 comments2 min readLW link