Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
beren
Karma:
3,026
Interested in many things. I have a personal blog at
https://www.beren.io/
All
Posts
Comments
New
Top
Old
Page
1
Maintaining Alignment during RSI as a Feedback Control Problem
beren
2 Mar 2025 0:21 UTC
67
points
6
comments
11
min read
LW
link
Capital Ownership Will Not Prevent Human Disempowerment
beren
5 Jan 2025 6:00 UTC
152
points
19
comments
14
min read
LW
link
[Question]
When and why did ‘training’ become ‘pretraining’?
beren
8 Mar 2024 14:29 UTC
16
points
6
comments
1
min read
LW
link
Theories of Change for AI Auditing
Lee Sharkey
,
beren
and
Marius Hobbhahn
13 Nov 2023 19:33 UTC
54
points
0
comments
18
min read
LW
link
(www.apolloresearch.ai)
[Linkpost] Biden-Harris Executive Order on AI
beren
30 Oct 2023 15:20 UTC
3
points
0
comments
1
min read
LW
link
Preference Aggregation as Bayesian Inference
beren
27 Jul 2023 17:59 UTC
14
points
1
comment
1
min read
LW
link
Thoughts on Loss Landscapes and why Deep Learning works
beren
25 Jul 2023 16:41 UTC
54
points
4
comments
18
min read
LW
link
BCIs and the ecosystem of modular minds
beren
21 Jul 2023 15:58 UTC
88
points
14
comments
11
min read
LW
link
Hedonic Loops and Taming RL
beren
19 Jul 2023 15:12 UTC
20
points
14
comments
9
min read
LW
link
[Linkpost] Introducing Superalignment
beren
5 Jul 2023 18:23 UTC
175
points
69
comments
1
min read
LW
link
(openai.com)
The case for removing alignment and ML research from the training dataset
beren
30 May 2023 20:54 UTC
48
points
8
comments
5
min read
LW
link
Announcing Apollo Research
Marius Hobbhahn
,
beren
,
Lee Sharkey
,
Lucius Bushnaq
,
Dan Braun
,
Mikita Balesni
and
Jérémy Scheurer
30 May 2023 16:17 UTC
217
points
11
comments
8
min read
LW
link
A small update to the Sparse Coding interim research report
Lee Sharkey
,
Dan Braun
and
beren
30 Apr 2023 19:54 UTC
61
points
5
comments
1
min read
LW
link
Deep learning models might be secretly (almost) linear
beren
24 Apr 2023 18:43 UTC
117
points
29
comments
4
min read
LW
link
Scaffolded LLMs as natural language computers
beren
12 Apr 2023 10:47 UTC
95
points
10
comments
11
min read
LW
link
The surprising parameter efficiency of vision models
beren
8 Apr 2023 19:44 UTC
81
points
28
comments
4
min read
LW
link
The Computational Anatomy of Human Values
beren
6 Apr 2023 10:33 UTC
74
points
30
comments
30
min read
LW
link
Orthogonality is expensive
beren
3 Apr 2023 10:20 UTC
43
points
9
comments
3
min read
LW
link
RLHF does not appear to differentially cause mode-collapse
Arthur Conmy
and
beren
20 Mar 2023 15:39 UTC
95
points
9
comments
3
min read
LW
link
Against ubiquitous alignment taxes
beren
6 Mar 2023 19:50 UTC
57
points
10
comments
2
min read
LW
link
Back to top
Next