Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
William_S
(William Saunders)
Karma:
659
member of OpenAI scalable alignment team
All
Posts
Comments
New
Top
Old
HCH is not just Mechanical Turk
William_S
9 Feb 2019 0:46 UTC
42
points
6
comments
3
min read
LW
link
Understanding Iterated Distillation and Amplification: Claims and Oversight
William_S
17 Apr 2018 22:36 UTC
34
points
30
comments
9
min read
LW
link
Thoughts on refusing harmful requests to large language models
William_S
19 Jan 2023 19:49 UTC
30
points
4
comments
2
min read
LW
link
Reinforcement Learning in the Iterated Amplification Framework
William_S
9 Feb 2019 0:56 UTC
25
points
12
comments
4
min read
LW
link
Amplification Discussion Notes
William_S
1 Jun 2018 19:03 UTC
17
points
3
comments
3
min read
LW
link
[Question]
Is there an intuitive way to explain how much better superforecasters are than regular forecasters?
William_S
19 Feb 2020 1:07 UTC
16
points
5
comments
1
min read
LW
link
Improbable Oversight, An Attempt at Informed Oversight
William_S
24 May 2017 17:43 UTC
3
points
9
comments
1
min read
LW
link
(william-r-s.github.io)
Proposal for an Implementable Toy Model of Informed Oversight
William_S
24 May 2017 17:43 UTC
2
points
1
comment
1
min read
LW
link
(william-r-s.github.io)
Informed Oversight through Generalizing Explanations
William_S
24 May 2017 17:43 UTC
2
points
0
comments
1
min read
LW
link
(william-r-s.github.io)
Back to top