RSS

William_S(William Saunders)

Karma: 659

member of OpenAI scalable alignment team

HCH is not just Me­chan­i­cal Turk

William_S9 Feb 2019 0:46 UTC
42 points
6 comments3 min readLW link

Un­der­stand­ing Iter­ated Distil­la­tion and Am­plifi­ca­tion: Claims and Oversight

William_S17 Apr 2018 22:36 UTC
34 points
30 comments9 min readLW link

Thoughts on re­fus­ing harm­ful re­quests to large lan­guage models

William_S19 Jan 2023 19:49 UTC
30 points
4 comments2 min readLW link

Re­in­force­ment Learn­ing in the Iter­ated Am­plifi­ca­tion Framework

William_S9 Feb 2019 0:56 UTC
25 points
12 comments4 min readLW link

Am­plifi­ca­tion Dis­cus­sion Notes

William_S1 Jun 2018 19:03 UTC
17 points
3 comments3 min readLW link

[Question] Is there an in­tu­itive way to ex­plain how much bet­ter su­perfore­cast­ers are than reg­u­lar fore­cast­ers?

William_S19 Feb 2020 1:07 UTC
16 points
5 comments1 min readLW link

Im­prob­a­ble Over­sight, An At­tempt at In­formed Oversight

William_S24 May 2017 17:43 UTC
3 points
9 comments1 min readLW link
(william-r-s.github.io)

Pro­posal for an Im­ple­mentable Toy Model of In­formed Oversight

William_S24 May 2017 17:43 UTC
2 points
1 comment1 min readLW link
(william-r-s.github.io)

In­formed Over­sight through Gen­er­al­iz­ing Explanations

William_S24 May 2017 17:43 UTC
2 points
0 comments1 min readLW link
(william-r-s.github.io)