Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Ziqian Zhong
Karma:
105
I do technical AI interp and safety research.
All
Posts
Comments
New
Top
Old
Spontaneous introspection in output tampering
Ziqian Zhong
26 Apr 2026 20:05 UTC
25
points
1
comment
12
min read
LW
link
Pando: A Controlled Benchmark for Interpretability Methods
Ziqian Zhong
21 Apr 2026 21:40 UTC
6
points
0
comments
3
min read
LW
link
(arxiv.org)
Hodoscope: Visualization for Efficient Human Supervision
Ziqian Zhong
and
Shashwat Saxena
20 Feb 2026 23:41 UTC
9
points
0
comments
2
min read
LW
link
(hodoscope.dev)
ImpossibleBench: Measuring Reward Hacking in LLM Coding Agents
Ziqian Zhong
30 Oct 2025 2:52 UTC
62
points
5
comments
3
min read
LW
link
(arxiv.org)
Weight-diff SVD for LLM Monitoring
Ziqian Zhong
5 Aug 2025 0:31 UTC
2
points
0
comments
2
min read
LW
link
(arxiv.org)
Back to top