Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Helena Casademunt
Karma:
80
All
Posts
Comments
New
Top
Old
Censored LLMs as a Natural Testbed for Secret Knowledge Elicitation
Bartosz Cywiński
,
Helena Casademunt
,
Khoi Tran
,
aryaj
,
Sam Marks
and
Neel Nanda
9 Mar 2026 18:50 UTC
30
points
2
comments
5
min read
LW
link
Steering Out-of-Distribution Generalization with Concept Ablation Fine-Tuning
kh4dien
,
Helena Casademunt
,
Adam Karvonen
,
Sam Marks
,
Senthooran Rajamanoharan
and
Neel Nanda
23 Jul 2025 14:57 UTC
79
points
8
comments
5
min read
LW
link
Back to top