Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
David Udell
Karma:
2,372
All
Posts
Comments
New
Top
Old
Page
1
Sparse Coding, for Mechanistic Interpretability and Activation Engineering
David Udell
23 Sep 2023 19:16 UTC
42
points
7
comments
34
min read
LW
link
ActAdd: Steering Language Models without Optimization
technicalities
,
TurnTrout
,
lisathiergart
,
David Udell
,
Ulisse Mini
and
Monte M
6 Sep 2023 17:21 UTC
105
points
3
comments
2
min read
LW
link
(arxiv.org)
Steering GPT-2-XL by adding an activation vector
TurnTrout
,
Monte M
,
David Udell
,
lisathiergart
and
Ulisse Mini
13 May 2023 18:42 UTC
423
points
97
comments
50
min read
LW
link
Understanding and controlling a maze-solving policy network
TurnTrout
,
peligrietzer
,
Ulisse Mini
,
Monte M
and
David Udell
11 Mar 2023 18:59 UTC
312
points
22
comments
23
min read
LW
link
Beneath My Epistemic Dignity
David Udell
28 Feb 2023 4:02 UTC
6
points
3
comments
2
min read
LW
link
Probability Theory: The Logic of Science, Jaynes
David Udell
16 Feb 2023 21:57 UTC
29
points
0
comments
18
min read
LW
link
Rounding Someone Off
David Udell
24 Jan 2023 0:03 UTC
25
points
0
comments
5
min read
LW
link
Consequentialists: One-Way Pattern Traps
David Udell
16 Jan 2023 20:48 UTC
54
points
3
comments
14
min read
LW
link
Linear Algebra Done Right, Axler
David Udell
2 Jan 2023 22:54 UTC
56
points
6
comments
9
min read
LW
link
Naive Set Theory, Halmos
David Udell
22 Dec 2022 2:34 UTC
11
points
1
comment
8
min read
LW
link
Moorean Statements
David Udell
22 Oct 2022 0:50 UTC
11
points
11
comments
1
min read
LW
link
Dath Ilan’s Views on Stopgap Corrigibility
David Udell
22 Sep 2022 16:16 UTC
77
points
19
comments
13
min read
LW
link
(www.glowfic.com)
Guidelines for Mad Entrepreneurs
David Udell
16 Sep 2022 6:33 UTC
26
points
0
comments
11
min read
LW
link
Framing AI Childhoods
David Udell
6 Sep 2022 23:40 UTC
37
points
8
comments
4
min read
LW
link
The Shard Theory Alignment Scheme
David Udell
25 Aug 2022 4:52 UTC
47
points
32
comments
2
min read
LW
link
“What Mistakes Are You Making Right Now?”
David Udell
15 Aug 2022 21:19 UTC
13
points
2
comments
1
min read
LW
link
Shard Theory: An Overview
David Udell
11 Aug 2022 5:44 UTC
161
points
34
comments
10
min read
LW
link
Team Shard Status Report
David Udell
9 Aug 2022 5:33 UTC
38
points
8
comments
3
min read
LW
link
How Deadly Will Roughly-Human-Level AGI Be?
David Udell
8 Aug 2022 1:59 UTC
12
points
6
comments
1
min read
LW
link
Finding Skeletons on Rashomon Ridge
David Udell
,
Peter S. Park
and
NickyP
24 Jul 2022 22:31 UTC
30
points
2
comments
7
min read
LW
link
Back to top
Next