William_S(William Saunders)

Karma: 659

member of OpenAI scalable alignment team

HCH is not just Mechanical Turk

William_S9 Feb 2019 0:46 UTC

42 points

6 comments3 min readLW link

Understanding Iterated Distillation and Amplification: Claims and Oversight

William_S17 Apr 2018 22:36 UTC

34 points

30 comments9 min readLW link

Thoughts on refusing harmful requests to large language models

William_S19 Jan 2023 19:49 UTC

30 points

4 comments2 min readLW link

Reinforcement Learning in the Iterated Amplification Framework

William_S9 Feb 2019 0:56 UTC

25 points

12 comments4 min readLW link

Amplification Discussion Notes

William_S1 Jun 2018 19:03 UTC

17 points

3 comments3 min readLW link

[Question] Is there an intuitive way to explain how much better superforecasters are than regular forecasters?

William_S19 Feb 2020 1:07 UTC

16 points

5 comments1 min readLW link

Improbable Oversight, An Attempt at Informed Oversight

William_S24 May 2017 17:43 UTC

3 points

9 comments1 min readLW link

(william-r-s.github.io)

Proposal for an Implementable Toy Model of Informed Oversight

William_S24 May 2017 17:43 UTC

2 points

1 comment1 min readLW link

(william-r-s.github.io)

Informed Oversight through Generalizing Explanations

William_S24 May 2017 17:43 UTC

2 points

0 comments1 min readLW link

(william-r-s.github.io)