Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Khoi Tran
Karma:
97
All
Posts
Comments
New
Top
Old
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
Bartosz Cywiński
,
Helena Casademunt
,
Khoi Tran
,
aryaj
,
Sam Marks
and
Neel Nanda
9 Mar 2026 18:50 UTC
30
points
1
comment
5
min read
LW
link
Test your interpretability techniques by de-censoring Chinese models
Khoi Tran
,
aryaj
,
Senthooran Rajamanoharan
and
Neel Nanda
15 Jan 2026 16:33 UTC
90
points
14
comments
20
min read
LW
link
Back to top