Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
jorio
Karma:
114
MATS 8.0 Scholar
All
Posts
Comments
New
Top
Old
Concept Poisoning: Probing LLMs without probes
Jan Betley
,
jorio
,
dylan_f
and
Owain_Evans
5 Aug 2025 17:00 UTC
60
points
5
comments
13
min read
LW
link
Selective Generalization: Improving Capabilities While Maintaining Alignment
ariana_azarbal
,
Matthew A. Clarke
,
jorio
,
Cailley Factor
and
cloud
16 Jul 2025 21:25 UTC
67
points
4
comments
7
min read
LW
link
Back to top