Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
beren
(Beren Millidge)
Karma:
1,975
Interested in many things. I have a personal blog at
https://www.beren.io/
All
Posts
Comments
New
Top
Old
Page
1
The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable
beren
and
Sid Black
28 Nov 2022 12:54 UTC
183
points
31
comments
31
min read
LW
link
Gradient hacking is extremely difficult
beren
24 Jan 2023 15:45 UTC
146
points
19
comments
5
min read
LW
link
Basic Facts about Language Model Internals
beren
and
Eric Winsor
4 Jan 2023 13:01 UTC
122
points
18
comments
9
min read
LW
link
Deep learning models might be secretly (almost) linear
beren
24 Apr 2023 18:43 UTC
98
points
20
comments
4
min read
LW
link
Deconfusing Direct vs Amortised Optimization
beren
2 Dec 2022 11:30 UTC
92
points
14
comments
10
min read
LW
link
Basic facts about language models during training
beren
21 Feb 2023 11:46 UTC
85
points
14
comments
18
min read
LW
link
Scaffolded LLMs as natural language computers
beren
12 Apr 2023 10:47 UTC
78
points
9
comments
11
min read
LW
link
The surprising parameter efficiency of vision models
beren
8 Apr 2023 19:44 UTC
73
points
28
comments
4
min read
LW
link
The Computational Anatomy of Human Values
beren
6 Apr 2023 10:33 UTC
63
points
30
comments
30
min read
LW
link
Against ubiquitous alignment taxes
beren
6 Mar 2023 19:50 UTC
56
points
10
comments
2
min read
LW
link
The case for removing alignment and ML research from the training dataset
beren
30 May 2023 20:54 UTC
46
points
8
comments
5
min read
LW
link
Scaling laws vs individual differences
beren
10 Jan 2023 13:22 UTC
42
points
21
comments
7
min read
LW
link
Empathy as a natural consequence of learnt reward models
beren
4 Feb 2023 15:35 UTC
38
points
27
comments
13
min read
LW
link
Human sexuality as an interesting case study of alignment
beren
30 Dec 2022 13:37 UTC
38
points
26
comments
3
min read
LW
link
An ML interpretation of Shard Theory
beren
3 Jan 2023 20:30 UTC
38
points
5
comments
4
min read
LW
link
The ultimate limits of alignment will determine the shape of the long term future
beren
2 Jan 2023 12:47 UTC
34
points
2
comments
6
min read
LW
link
Orthogonality is expensive
beren
3 Apr 2023 10:20 UTC
33
points
8
comments
3
min read
LW
link
AGI will have learnt utility functions
beren
25 Jan 2023 19:42 UTC
33
points
3
comments
13
min read
LW
link
Evidence on recursive self-improvement from current ML
beren
30 Dec 2022 20:53 UTC
31
points
12
comments
6
min read
LW
link
Addendum: basic facts about language models during training
beren
6 Mar 2023 19:24 UTC
22
points
2
comments
5
min read
LW
link
Back to top
Next