beren(Beren Millidge)

Karma: 2,728

Interested in many things. I have a personal blog at https://www.beren.io/

The Singular Value Decompositions of Transformer Weight Matrices are Highly Interpretable

beren and Sid Black

28 Nov 2022 12:54 UTC

197 points

33 comments31 min readLW link

[Linkpost] Introducing Superalignment

beren5 Jul 2023 18:23 UTC

173 points

68 comments1 min readLW link

(openai.com)

Gradient hacking is extremely difficult

beren24 Jan 2023 15:45 UTC

161 points

22 comments5 min readLW link

Basic Facts about Language Model Internals

beren and Eric Winsor

4 Jan 2023 13:01 UTC

129 points

18 comments9 min readLW link

Deconfusing Direct vs Amortised Optimization

beren2 Dec 2022 11:30 UTC

113 points

19 comments10 min readLW link

Deep learning models might be secretly (almost) linear

beren24 Apr 2023 18:43 UTC

111 points

29 comments4 min readLW link

Basic facts about language models during training

beren21 Feb 2023 11:46 UTC

97 points

14 comments18 min readLW link

Scaffolded LLMs as natural language computers

beren12 Apr 2023 10:47 UTC

94 points

10 comments11 min readLW link

BCIs and the ecosystem of modular minds

beren21 Jul 2023 15:58 UTC

88 points

14 comments11 min readLW link

The surprising parameter efficiency of vision models

beren8 Apr 2023 19:44 UTC

77 points

28 comments4 min readLW link

The Computational Anatomy of Human Values

beren6 Apr 2023 10:33 UTC

70 points

30 comments30 min readLW link

Against ubiquitous alignment taxes

beren6 Mar 2023 19:50 UTC

56 points

10 comments2 min readLW link

Thoughts on Loss Landscapes and why Deep Learning works

beren25 Jul 2023 16:41 UTC

53 points

4 comments18 min readLW link

The case for removing alignment and ML research from the training dataset

beren30 May 2023 20:54 UTC

48 points

8 comments5 min readLW link

Empathy as a natural consequence of learnt reward models

beren4 Feb 2023 15:35 UTC

46 points

27 comments13 min readLW link

Scaling laws vs individual differences

beren10 Jan 2023 13:22 UTC

44 points

21 comments7 min readLW link

Orthogonality is expensive

beren3 Apr 2023 10:20 UTC

42 points

9 comments3 min readLW link

An ML interpretation of Shard Theory

beren3 Jan 2023 20:30 UTC

39 points

5 comments4 min readLW link

Human sexuality as an interesting case study of alignment

beren30 Dec 2022 13:37 UTC

39 points

26 comments3 min readLW link

AGI will have learnt utility functions

beren25 Jan 2023 19:42 UTC

36 points

3 comments13 min readLW link