Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Buck
(Buck Shlegeris)
Karma:
5,640
All
Posts
Comments
New
Top
Old
Page
1
“Other people are wrong” vs “I am right”
Buck
22 Feb 2019 20:01 UTC
246
points
20
comments
9
min read
LW
link
2
reviews
Six economics misconceptions of mine which I’ve resolved over the last few years
Buck
13 Jul 2020 3:01 UTC
194
points
59
comments
9
min read
LW
link
AI Control: Improving Safety Despite Intentional Subversion
Buck
,
Fabien Roger
,
ryan_greenblatt
and
Kshitij Sachan
13 Dec 2023 15:51 UTC
190
points
4
comments
10
min read
LW
link
Language models seem to be much better than humans at next-token prediction
Buck
,
Fabien Roger
and
LawrenceC
11 Aug 2022 17:45 UTC
182
points
59
comments
13
min read
LW
link
1
review
The prototypical catastrophic AI action is getting root access to its datacenter
Buck
2 Jun 2022 23:46 UTC
164
points
13
comments
2
min read
LW
link
1
review
Worst-case thinking in AI alignment
Buck
23 Dec 2021 1:29 UTC
162
points
18
comments
6
min read
LW
link
2
reviews
A freshman year during the AI midgame: my approach to the next year
Buck
14 Apr 2023 0:38 UTC
146
points
14
comments
1
min read
LW
link
Redwood Research’s current project
Buck
21 Sep 2021 23:30 UTC
145
points
29
comments
15
min read
LW
link
1
review
Takeoff speeds have a huge effect on what it means to work on AI x-risk
Buck
13 Apr 2022 17:38 UTC
139
points
27
comments
2
min read
LW
link
2
reviews
The theory-practice gap
Buck
17 Sep 2021 22:51 UTC
138
points
15
comments
6
min read
LW
link
The case for becoming a black-box investigator of language models
Buck
6 May 2022 14:35 UTC
125
points
20
comments
3
min read
LW
link
One-layer transformers aren’t equivalent to a set of skip-trigrams
Buck
17 Feb 2023 17:26 UTC
119
points
10
comments
7
min read
LW
link
Trying to disambiguate different questions about whether RLHF is “good”
Buck
14 Dec 2022 4:03 UTC
106
points
47
comments
7
min read
LW
link
1
review
Funds are available to support LessWrong groups, among others
Buck
and
ClaireZabel
21 Jul 2021 1:11 UTC
88
points
3
comments
1
min read
LW
link
The alignment problem in different capability regimes
Buck
9 Sep 2021 19:46 UTC
88
points
12
comments
5
min read
LW
link
Some thoughts on criticism
Buck
18 Sep 2020 4:58 UTC
88
points
11
comments
5
min read
LW
link
Polysemanticity and Capacity in Neural Networks
Buck
,
Adam Jermyn
and
Kshitij Sachan
7 Oct 2022 17:51 UTC
87
points
14
comments
3
min read
LW
link
Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck
and
ryan_greenblatt
26 Jul 2023 17:02 UTC
83
points
18
comments
1
min read
LW
link
How good is humanity at coordination?
Buck
21 Jul 2020 20:01 UTC
82
points
44
comments
3
min read
LW
link
Untrusted smart models and trusted dumb models
Buck
4 Nov 2023 3:06 UTC
80
points
12
comments
6
min read
LW
link
Back to top
Next