RSS

State­ment on AI Ex­tinc­tion—Signed by AGI Labs, Top Aca­demics, and Many Other Notable Figures

Dan H30 May 2023 9:05 UTC
317 points
50 comments1 min readLW link
(www.safe.ai)

Think care­fully be­fore call­ing RL poli­cies “agents”

TurnTrout2 Jun 2023 3:46 UTC
42 points
1 comment5 min readLW link

Us­ing GPT-Eliezer against ChatGPT Jailbreaking

6 Dec 2022 19:54 UTC
170 points
85 comments9 min readLW link

Shut­down-Seek­ing AI

Simon Goldstein31 May 2023 22:19 UTC
20 points
11 comments15 min readLW link

Mak­ing Nanobots isn’t a one-shot pro­cess, even for an ar­tifi­cial superintelligance

dankrad25 Apr 2023 0:39 UTC
18 points
10 comments6 min readLW link

Steer­ing GPT-2-XL by adding an ac­ti­va­tion vector

13 May 2023 18:42 UTC
378 points
74 comments50 min readLW link

Is Deon­tolog­i­cal AI Safe? [Feed­back Draft]

27 May 2023 16:39 UTC
20 points
13 comments20 min readLW link

Short Re­mark on the (sub­jec­tive) math­e­mat­i­cal ‘nat­u­ral­ness’ of the Nanda—Lie­berum ad­di­tion mod­ulo 113 algorithm

Spencer Becker-Kahn1 Jun 2023 11:31 UTC
61 points
2 comments2 min readLW link

PaLM-2 & GPT-4 in “Ex­trap­o­lat­ing GPT-N perfor­mance”

Lukas Finnveden30 May 2023 18:33 UTC
51 points
5 comments6 min readLW link

Con­di­tional Pre­dic­tion with Zero-Sum Train­ing Solves Self-Fulfilling Prophecies

26 May 2023 17:44 UTC
86 points
12 comments24 min readLW link

[Question] Se­ri­ously, what goes wrong with “re­ward the agent when it makes you smile”?

TurnTrout11 Aug 2022 22:22 UTC
81 points
42 comments2 min readLW link

Re­ward is not the op­ti­miza­tion target

TurnTrout25 Jul 2022 0:03 UTC
291 points
109 comments10 min readLW link

A shot at the di­a­mond-al­ign­ment problem

TurnTrout6 Oct 2022 18:29 UTC
92 points
57 comments15 min readLW link

Power-seek­ing can be prob­a­ble and pre­dic­tive for trained agents

28 Feb 2023 21:10 UTC
53 points
10 comments9 min readLW link
(arxiv.org)

EIS VI: Cri­tiques of Mechanis­tic In­ter­pretabil­ity Work in AI Safety

scasper17 Feb 2023 20:48 UTC
38 points
9 comments12 min readLW link

Lan­guage Agents Re­duce the Risk of Ex­is­ten­tial Catastrophe

28 May 2023 19:10 UTC
24 points
12 comments26 min readLW link

Acausal trade: Introduction

Stuart_Armstrong11 May 2017 12:03 UTC
1 point
1 comment1 min readLW link

There are no co­her­ence theorems

20 Feb 2023 21:25 UTC
90 points
99 comments19 min readLW link

Don’t leave your finger­prints on the future

So8res8 Oct 2022 0:35 UTC
109 points
33 comments5 min readLW link

An­nounc­ing Apollo Research

30 May 2023 16:17 UTC
188 points
7 comments8 min readLW link