RSS

ThomasW

Karma: 1,228

Center for AI Safety

Risks from AI Overview: Summary

18 Aug 2023 1:21 UTC
25 points
0 comments13 min readLW link
(www.safe.ai)

Catas­trophic Risks from AI #6: Dis­cus­sion and FAQ

27 Jun 2023 23:23 UTC
24 points
1 comment13 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #5: Rogue AIs

27 Jun 2023 22:06 UTC
15 points
0 comments22 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #4: Or­ga­ni­za­tional Risks

26 Jun 2023 19:36 UTC
23 points
0 comments21 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #3: AI Race

23 Jun 2023 19:21 UTC
18 points
9 comments29 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #2: Mal­i­cious Use

22 Jun 2023 17:10 UTC
38 points
1 comment17 min readLW link
(arxiv.org)

Catas­trophic Risks from AI #1: Introduction

22 Jun 2023 17:09 UTC
40 points
1 comment5 min readLW link
(arxiv.org)

[MLSN #9] Ver­ify­ing large train­ing runs, se­cu­rity risks from LLM ac­cess to APIs, why nat­u­ral se­lec­tion may fa­vor AIs over humans

11 Apr 2023 16:03 UTC
11 points
0 comments6 min readLW link
(newsletter.mlsafety.org)

[MLSN #8] Mechanis­tic in­ter­pretabil­ity, us­ing law to in­form AI al­ign­ment, scal­ing laws for proxy gaming

20 Feb 2023 15:54 UTC
20 points
0 comments4 min readLW link
(newsletter.mlsafety.org)

What’s the deal with AI con­scious­ness?

ThomasW11 Jan 2023 16:37 UTC
6 points
13 comments9 min readLW link
(aiwatchtower.substack.com)

Im­pli­ca­tions of simulators

ThomasW7 Jan 2023 0:37 UTC
17 points
0 comments12 min readLW link

“AI” is an indexical

ThomasW3 Jan 2023 22:00 UTC
10 points
0 comments6 min readLW link
(aiwatchtower.substack.com)

A Year of AI In­creas­ing AI Progress

ThomasW30 Dec 2022 2:09 UTC
148 points
3 comments2 min readLW link

Did ChatGPT just gaslight me?

ThomasW1 Dec 2022 5:41 UTC
123 points
45 comments9 min readLW link
(aiwatchtower.substack.com)

A philoso­pher’s cri­tique of RLHF

ThomasW7 Nov 2022 2:42 UTC
55 points
8 comments2 min readLW link

ML Safety Schol­ars Sum­mer 2022 Retrospective

ThomasW1 Nov 2022 3:09 UTC
29 points
0 comments1 min readLW link

An­nounc­ing the In­tro­duc­tion to ML Safety course

6 Aug 2022 2:46 UTC
73 points
6 comments7 min readLW link

$20K In Boun­ties for AI Safety Public Materials

5 Aug 2022 2:52 UTC
71 points
9 comments6 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

ThomasW17 Jul 2022 20:06 UTC
107 points
14 comments1 min readLW link

Open Prob­lems in AI X-Risk [PAIS #5]

10 Jun 2022 2:08 UTC
59 points
6 comments36 min readLW link