RSS

Buck(Buck Shlegeris)

Karma: 5,640

“Other peo­ple are wrong” vs “I am right”

Buck22 Feb 2019 20:01 UTC
246 points
20 comments9 min readLW link2 reviews

Six eco­nomics mis­con­cep­tions of mine which I’ve re­solved over the last few years

Buck13 Jul 2020 3:01 UTC
194 points
59 comments9 min readLW link

AI Con­trol: Im­prov­ing Safety De­spite In­ten­tional Subversion

13 Dec 2023 15:51 UTC
190 points
4 comments10 min readLW link

Lan­guage mod­els seem to be much bet­ter than hu­mans at next-to­ken prediction

11 Aug 2022 17:45 UTC
182 points
59 comments13 min readLW link1 review

The pro­to­typ­i­cal catas­trophic AI ac­tion is get­ting root ac­cess to its datacenter

Buck2 Jun 2022 23:46 UTC
164 points
13 comments2 min readLW link1 review

Worst-case think­ing in AI alignment

Buck23 Dec 2021 1:29 UTC
162 points
18 comments6 min readLW link2 reviews

A fresh­man year dur­ing the AI midgame: my ap­proach to the next year

Buck14 Apr 2023 0:38 UTC
146 points
14 comments1 min readLW link

Red­wood Re­search’s cur­rent project

Buck21 Sep 2021 23:30 UTC
145 points
29 comments15 min readLW link1 review

Take­off speeds have a huge effect on what it means to work on AI x-risk

Buck13 Apr 2022 17:38 UTC
139 points
27 comments2 min readLW link2 reviews

The the­ory-prac­tice gap

Buck17 Sep 2021 22:51 UTC
138 points
15 comments6 min readLW link

The case for be­com­ing a black-box in­ves­ti­ga­tor of lan­guage models

Buck6 May 2022 14:35 UTC
125 points
20 comments3 min readLW link

One-layer trans­form­ers aren’t equiv­a­lent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC
119 points
10 comments7 min readLW link

Try­ing to dis­am­biguate differ­ent ques­tions about whether RLHF is “good”

Buck14 Dec 2022 4:03 UTC
106 points
47 comments7 min readLW link1 review

Funds are available to sup­port LessWrong groups, among others

21 Jul 2021 1:11 UTC
88 points
3 comments1 min readLW link

The al­ign­ment prob­lem in differ­ent ca­pa­bil­ity regimes

Buck9 Sep 2021 19:46 UTC
88 points
12 comments5 min readLW link

Some thoughts on criticism

Buck18 Sep 2020 4:58 UTC
88 points
11 comments5 min readLW link

Poly­se­man­tic­ity and Ca­pac­ity in Neu­ral Networks

7 Oct 2022 17:51 UTC
87 points
14 comments3 min readLW link

Meta-level ad­ver­sar­ial eval­u­a­tion of over­sight tech­niques might al­low ro­bust mea­sure­ment of their adequacy

26 Jul 2023 17:02 UTC
83 points
18 comments1 min readLW link

How good is hu­man­ity at co­or­di­na­tion?

Buck21 Jul 2020 20:01 UTC
82 points
44 comments3 min readLW link

Un­trusted smart mod­els and trusted dumb models

Buck4 Nov 2023 3:06 UTC
80 points
12 comments6 min readLW link