Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Cam
Karma:
491
I help run
www.geodesicresearch.org
All
Posts
Comments
New
Top
Old
Alignment Pretraining: AI Discourse Causes Self-Fulfilling (Mis)alignment
Cam
,
Puria
,
Kyle O’Brien
,
David Africa
,
Samuel Ratnam
and
andyk
21 Dec 2025 0:53 UTC
195
points
25
comments
9
min read
LW
link
Architectures for Increased Externalisation of Reasoning
Karthik Viswanathan
,
Liza Pavlova
,
Mariia Koroliuk
,
Puria
,
Cam
and
Edward James Young
26 Nov 2025 20:24 UTC
31
points
2
comments
13
min read
LW
link
Generalisation Hacking: a first look at adversarial generalisation failures in deliberative alignment
Cam
and
Puria
17 Nov 2025 21:44 UTC
46
points
2
comments
8
min read
LW
link
Open-weight training practices and implications for CoT monitorability
Cam
and
robert mccarthy
4 Nov 2025 10:49 UTC
15
points
0
comments
9
min read
LW
link
Cam’s Shortform
Cam
9 Feb 2025 17:32 UTC
1
point
19
comments
1
min read
LW
link
Back to top