Buck(Buck Shlegeris)

Karma: 5,640

“Other people are wrong” vs “I am right”

Buck22 Feb 2019 20:01 UTC

246 points

20 comments9 min readLW link 2 reviews

Six economics misconceptions of mine which I’ve resolved over the last few years

Buck13 Jul 2020 3:01 UTC

194 points

59 comments9 min readLW link

AI Control: Improving Safety Despite Intentional Subversion

Buck, Fabien Roger, ryan_greenblatt and Kshitij Sachan

13 Dec 2023 15:51 UTC

190 points

4 comments10 min readLW link

Language models seem to be much better than humans at next-token prediction

Buck, Fabien Roger and LawrenceC

11 Aug 2022 17:45 UTC

182 points

59 comments13 min readLW link 1 review

The prototypical catastrophic AI action is getting root access to its datacenter

Buck2 Jun 2022 23:46 UTC

164 points

13 comments2 min readLW link 1 review

Worst-case thinking in AI alignment

Buck23 Dec 2021 1:29 UTC

162 points

18 comments6 min readLW link 2 reviews

A freshman year during the AI midgame: my approach to the next year

Buck14 Apr 2023 0:38 UTC

146 points

14 comments1 min readLW link

Redwood Research’s current project

Buck21 Sep 2021 23:30 UTC

145 points

29 comments15 min readLW link 1 review

Takeoff speeds have a huge effect on what it means to work on AI x-risk

Buck13 Apr 2022 17:38 UTC

139 points

27 comments2 min readLW link 2 reviews

The theory-practice gap

Buck17 Sep 2021 22:51 UTC

138 points

15 comments6 min readLW link

The case for becoming a black-box investigator of language models

Buck6 May 2022 14:35 UTC

125 points

20 comments3 min readLW link

One-layer transformers aren’t equivalent to a set of skip-trigrams

Buck17 Feb 2023 17:26 UTC

119 points

10 comments7 min readLW link

Trying to disambiguate different questions about whether RLHF is “good”

Buck14 Dec 2022 4:03 UTC

106 points

47 comments7 min readLW link 1 review

Funds are available to support LessWrong groups, among others

Buck and ClaireZabel

21 Jul 2021 1:11 UTC

88 points

3 comments1 min readLW link

The alignment problem in different capability regimes

Buck9 Sep 2021 19:46 UTC

88 points

12 comments5 min readLW link

Some thoughts on criticism

Buck18 Sep 2020 4:58 UTC

88 points

11 comments5 min readLW link

Polysemanticity and Capacity in Neural Networks

Buck, Adam Jermyn and Kshitij Sachan

7 Oct 2022 17:51 UTC

87 points

14 comments3 min readLW link

Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy

Buck and ryan_greenblatt

26 Jul 2023 17:02 UTC

83 points

18 comments1 min readLW link

How good is humanity at coordination?

Buck21 Jul 2020 20:01 UTC

82 points

44 comments3 min readLW link

Untrusted smart models and trusted dumb models

Buck4 Nov 2023 3:06 UTC

80 points

12 comments6 min readLW link