Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Zhijing Jin
Karma:
27
All
Posts
Comments
New
Top
Old
Testing the Authoritarian Bias of LLMs
Zhijing Jin
,
Irene Strauss
,
David Guzman Piedrahita
and
Keenan Samway
9 Aug 2025 18:09 UTC
10
points
1
comment
6
min read
LW
link
Why Reasoning Isn’t Enough: How LLM Agents Struggle with Ethics and Cooperation
Zhijing Jin
,
David Guzman Piedrahita
,
Yongjin Yang
and
Steffen Backmann
28 Jun 2025 20:43 UTC
6
points
0
comments
4
min read
LW
link
Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability
Zhijing Jin
,
Punya Syon Pandey
,
samuelsimko
and
Kellin Pelrine
11 Jun 2025 19:30 UTC
6
points
0
comments
5
min read
LW
link
Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games
David Guzman Piedrahita
,
Yongjin Yang
and
Zhijing Jin
22 Apr 2025 19:25 UTC
24
points
3
comments
5
min read
LW
link
Zhijing Jin
27 Sep 2023 16:10 UTC
1
point
0
in reply to:
Mo Putera
’s
comment
on:
Welcome to Apply: The 2024 Vitalik Buterin Fellowships in AI Existential Safety by FLI!
Thank you for spotting it! I just did the fix :).
Welcome to Apply: The 2024 Vitalik Buterin Fellowships in AI Existential Safety by FLI!
Zhijing Jin
25 Sep 2023 18:42 UTC
5
points
2
comments
2
min read
LW
link
Back to top
Thank you for spotting it! I just did the fix :).