Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
keith_wynroe
Karma:
294
All
Posts
Comments
New
Top
Old
Finding an Error-Detection Feature in DeepSeek-R1
keith_wynroe
24 Apr 2025 16:03 UTC
15
points
0
comments
7
min read
LW
link
Decomposing the QK circuit with Bilinear Sparse Dictionary Learning
keith_wynroe
and
Lee Sharkey
2 Jul 2024 13:17 UTC
86
points
7
comments
12
min read
LW
link
An OV-Coherent Toy Model of Attention Head Superposition
Lauren Greenspan
and
keith_wynroe
29 Aug 2023 19:44 UTC
26
points
2
comments
6
min read
LW
link
Literature review of TAI timelines
Jsevillamol
,
keith_wynroe
and
David Atkinson
27 Jan 2023 20:07 UTC
35
points
7
comments
2
min read
LW
link
(epochai.org)
You’re Not One “You”—How Decision Theories Are Talking Past Each Other
keith_wynroe
9 Jan 2023 1:21 UTC
28
points
11
comments
8
min read
LW
link
Back to top