RSS

beren(Beren Millidge)

Karma: 2,682

Interested in many things. I have a personal blog at https://​​www.beren.io/​​

The Sin­gu­lar Value De­com­po­si­tions of Trans­former Weight Ma­tri­ces are Highly Interpretable

28 Nov 2022 12:54 UTC
195 points
33 comments31 min readLW link

[Linkpost] In­tro­duc­ing Superalignment

beren5 Jul 2023 18:23 UTC
173 points
68 comments1 min readLW link
(openai.com)

Gra­di­ent hack­ing is ex­tremely difficult

beren24 Jan 2023 15:45 UTC
161 points
22 comments5 min readLW link

Ba­sic Facts about Lan­guage Model Internals

4 Jan 2023 13:01 UTC
130 points
18 comments9 min readLW link

Deep learn­ing mod­els might be se­cretly (al­most) linear

beren24 Apr 2023 18:43 UTC
110 points
28 comments4 min readLW link

De­con­fus­ing Direct vs Amor­tised Optimization

beren2 Dec 2022 11:30 UTC
107 points
17 comments10 min readLW link

Ba­sic facts about lan­guage mod­els dur­ing training

beren21 Feb 2023 11:46 UTC
96 points
14 comments18 min readLW link

Scaf­folded LLMs as nat­u­ral lan­guage computers

beren12 Apr 2023 10:47 UTC
92 points
10 comments11 min readLW link

BCIs and the ecosys­tem of mod­u­lar minds

beren21 Jul 2023 15:58 UTC
84 points
14 comments11 min readLW link

The sur­pris­ing pa­ram­e­ter effi­ciency of vi­sion models

beren8 Apr 2023 19:44 UTC
77 points
28 comments4 min readLW link

The Com­pu­ta­tional Anatomy of Hu­man Values

beren6 Apr 2023 10:33 UTC
70 points
30 comments30 min readLW link

Against ubiquitous al­ign­ment taxes

beren6 Mar 2023 19:50 UTC
56 points
10 comments2 min readLW link

Thoughts on Loss Land­scapes and why Deep Learn­ing works

beren25 Jul 2023 16:41 UTC
52 points
4 comments18 min readLW link

The case for re­mov­ing al­ign­ment and ML re­search from the train­ing dataset

beren30 May 2023 20:54 UTC
48 points
8 comments5 min readLW link

Em­pa­thy as a nat­u­ral con­se­quence of learnt re­ward models

beren4 Feb 2023 15:35 UTC
46 points
27 comments13 min readLW link

Scal­ing laws vs in­di­vi­d­ual differences

beren10 Jan 2023 13:22 UTC
44 points
21 comments7 min readLW link

An ML in­ter­pre­ta­tion of Shard Theory

beren3 Jan 2023 20:30 UTC
39 points
5 comments4 min readLW link

Hu­man sex­u­al­ity as an in­ter­est­ing case study of alignment

beren30 Dec 2022 13:37 UTC
39 points
26 comments3 min readLW link