RSS

scasper(Stephen Casper)

Karma: 1,565

https://​​stephencasper.com/​​

The 6D effect: When com­pa­nies take risks, one email can be very pow­er­ful.

scasper4 Nov 2023 20:08 UTC
261 points
40 comments3 min readLW link

Deep For­get­ting & Un­learn­ing for Safely-Scoped LLMs

scasper5 Dec 2023 16:48 UTC
109 points
29 comments13 min readLW link

[Linkpost] A sur­vey on over 300 works about in­ter­pretabil­ity in deep networks

scasper12 Sep 2022 19:07 UTC
97 points
7 comments2 min readLW link
(arxiv.org)

Take­aways from the Mechanis­tic In­ter­pretabil­ity Challenges

scasper8 Jun 2023 18:56 UTC
93 points
5 comments6 min readLW link

Analo­gies be­tween scal­ing labs and mis­al­igned su­per­in­tel­li­gent AI

scasper21 Feb 2024 19:29 UTC
72 points
4 comments4 min readLW link

Open Prob­lems and Fun­da­men­tal Limi­ta­tions of RLHF

scasper31 Jul 2023 15:31 UTC
66 points
6 comments2 min readLW link
(arxiv.org)

EIS V: Blind Spots In AI Safety In­ter­pretabil­ity Research

scasper16 Feb 2023 19:09 UTC
54 points
23 comments13 min readLW link

EIS VI: Cri­tiques of Mechanis­tic In­ter­pretabil­ity Work in AI Safety

scasper17 Feb 2023 20:48 UTC
48 points
9 comments12 min readLW link

The Eng­ineer’s In­ter­pretabil­ity Se­quence (EIS) I: Intro

scasper9 Feb 2023 16:28 UTC
45 points
24 comments3 min readLW link

Eight Strate­gies for Tack­ling the Hard Part of the Align­ment Problem

scasper8 Jul 2023 18:55 UTC
42 points
11 comments7 min readLW link

Ex­is­ten­tial AI Safety is NOT sep­a­rate from near-term applications

scasper13 Dec 2022 14:47 UTC
37 points
17 comments3 min readLW link

EIS VII: A Challenge for Mechanists

scasper18 Feb 2023 18:27 UTC
35 points
4 comments3 min readLW link

Dis­solv­ing Con­fu­sion around Func­tional De­ci­sion Theory

scasper5 Jan 2020 6:38 UTC
32 points
24 comments9 min readLW link

Where to be an AI Safety Pro­fes­sor

scasper7 Dec 2022 7:09 UTC
30 points
12 comments2 min readLW link

EIS IX: In­ter­pretabil­ity and Adversaries

scasper20 Feb 2023 18:25 UTC
30 points
7 comments8 min readLW link

Deep Dives: My Ad­vice for Pur­su­ing Work in Re­search

scasper11 Mar 2022 17:56 UTC
29 points
2 comments3 min readLW link

EIS II: What is “In­ter­pretabil­ity”?

scasper9 Feb 2023 16:48 UTC
28 points
6 comments4 min readLW link