beren

Karma: 3,162

Interested in many things. I have a personal blog at https://www.beren.io/

When does competition lead to recognisable values?

Jan_Kulveit, beren, David Duvenaud and Raymond Douglas

12 Jan 2026 23:13 UTC

65 points

18 comments25 min readLW link

(postagi.org)

Maintaining Alignment during RSI as a Feedback Control Problem

beren2 Mar 2025 0:21 UTC

76 points

6 comments11 min readLW link

Capital Ownership Will Not Prevent Human Disempowerment

beren5 Jan 2025 6:00 UTC

164 points

21 comments14 min readLW link

[Question] When and why did ‘training’ become ‘pretraining’?

beren8 Mar 2024 14:29 UTC

16 points

6 comments1 min readLW link

Theories of Change for AI Auditing

Lee Sharkey, beren and Marius Hobbhahn

13 Nov 2023 19:33 UTC

54 points

0 comments18 min readLW link

(www.apolloresearch.ai)

[Linkpost] Biden-Harris Executive Order on AI

beren30 Oct 2023 15:20 UTC

3 points

0 comments1 min readLW link

Preference Aggregation as Bayesian Inference

beren27 Jul 2023 17:59 UTC

14 points

1 comment1 min readLW link

Thoughts on Loss Landscapes and why Deep Learning works

beren25 Jul 2023 16:41 UTC

54 points

4 comments18 min readLW link

BCIs and the ecosystem of modular minds

beren21 Jul 2023 15:58 UTC

88 points

14 comments11 min readLW link

Hedonic Loops and Taming RL

beren19 Jul 2023 15:12 UTC

20 points

14 comments9 min readLW link

[Linkpost] Introducing Superalignment

beren5 Jul 2023 18:23 UTC

175 points

69 comments1 min readLW link

(openai.com)

The case for removing alignment and ML research from the training dataset

beren30 May 2023 20:54 UTC

50 points

8 comments5 min readLW link

Announcing Apollo Research

Marius Hobbhahn, beren, Lee Sharkey, Lucius Bushnaq, Dan Braun, Mikita Balesni and Jérémy Scheurer

30 May 2023 16:17 UTC

226 points

11 comments8 min readLW link

A small update to the Sparse Coding interim research report

Lee Sharkey, Dan Braun and beren

30 Apr 2023 19:54 UTC

61 points

5 comments1 min readLW link

Deep learning models might be secretly (almost) linear

beren24 Apr 2023 18:43 UTC

117 points

29 comments4 min readLW link

Scaffolded LLMs as natural language computers

beren12 Apr 2023 10:47 UTC

97 points

10 comments11 min readLW link

The surprising parameter efficiency of vision models

beren8 Apr 2023 19:44 UTC

81 points

28 comments4 min readLW link

The Computational Anatomy of Human Values

beren6 Apr 2023 10:33 UTC

76 points

30 comments30 min readLW link

Orthogonality is expensive

beren3 Apr 2023 10:20 UTC

43 points

9 comments3 min readLW link

RLHF does not appear to differentially cause mode-collapse

Arthur Conmy and beren

20 Mar 2023 15:39 UTC

95 points

9 comments3 min readLW link