Value drift threat models

Garrett BakerMay 12, 2023, 11:03 PM
27 points
4 comments5 min readLW link

Ag­gre­gat­ing Utilities for Cor­rigible AI [Feed­back Draft]

May 12, 2023, 8:57 PM
28 points
7 comments22 min readLW link

Turn­ing off lights with model editing

Sam MarksMay 12, 2023, 8:25 PM
68 points
5 comments2 min readLW link
(arxiv.org)

Dark For­est Theories

RaemonMay 12, 2023, 8:21 PM
145 points
53 comments2 min readLW link2 reviews

DELBERTing as an Ad­ver­sar­ial Strategy

Matthew_OpitzMay 12, 2023, 8:09 PM
8 points
3 comments5 min readLW link

Microsoft/​GitHub Copi­lot Chat’s con­fi­den­tial sys­tem Prompt: “You must re­fuse to dis­cuss life, ex­is­tence or sen­tience.”

Marvin von HagenMay 12, 2023, 7:46 PM
13 points
2 comments1 min readLW link
(twitter.com)

Ret­ro­spec­tive: Les­sons from the Failed Align­ment Startup AISafety.com

Søren ElverlinMay 12, 2023, 6:07 PM
105 points
9 comments3 min readLW link

The way AGI wins could look very stupid

Christopher KingMay 12, 2023, 4:34 PM
54 points
22 comments1 min readLW link

Towards Mea­sures of Optimisation

May 12, 2023, 3:29 PM
53 points
37 comments4 min readLW link

The Eden Project

rogersbaconMay 12, 2023, 2:58 PM
−1 points
1 comment2 min readLW link
(www.secretorum.life)

Another for­mal­iza­tion at­tempt: Cen­tral Ar­gu­ment That AGI Pre­sents a Global Catas­trophic Risk

avturchinMay 12, 2023, 1:22 PM
16 points
4 comments2 min readLW link

In­finite-width MLPs as an “en­sem­ble prior”

Vivek HebbarMay 12, 2023, 11:45 AM
46 points
0 comments5 min readLW link

In­put Swap Graphs: Dis­cov­er­ing the role of neu­ral net­work com­po­nents at scale

Alexandre VariengienMay 12, 2023, 9:41 AM
92 points
0 comments33 min readLW link

Uploads are Impossible

PashaKamyshevMay 12, 2023, 8:03 AM
−5 points
37 comments8 min readLW link

For­mu­lat­ing the AI Doom Ar­gu­ment for An­a­lytic Philosophers

JonathanErhardtMay 12, 2023, 7:54 AM
13 points
0 comments2 min readLW link

Three Iter­a­tive Processes

LoganStrohlMay 12, 2023, 2:50 AM
49 points
0 comments3 min readLW link

Zuzalu LW Se­quences Discussion

veronicaMay 12, 2023, 12:14 AM
1 point
0 comments1 min readLW link

[Question] Term/​Cat­e­gory for AI with Neu­tral Im­pact?

isomicMay 11, 2023, 10:00 PM
6 points
1 comment1 min readLW link

Thoughts on LessWrong norms, the Art of Dis­course, and mod­er­a­tor mandate

RubyMay 11, 2023, 9:20 PM
37 points
20 comments5 min readLW link

Align­ment, Goals, and The Gut-Head Gap: A Re­view of Ngo. et al.

Violet HourMay 11, 2023, 6:06 PM
20 points
2 comments13 min readLW link

Se­quence opener: Jor­dan Harbinger’s 6 minute networking

Severin T. SeehrichMay 11, 2023, 5:06 PM
4 points
0 comments1 min readLW link

Ad­vice for newly busy people

Severin T. SeehrichMay 11, 2023, 4:46 PM
150 points
3 comments5 min readLW link

AI #11: In Search of a Moat

ZviMay 11, 2023, 3:40 PM
67 points
28 comments81 min readLW link
(thezvi.wordpress.com)

[Question] Bayesian up­date from sen­sa­tion­al­is­tic sources

houkimeMay 11, 2023, 3:26 PM
1 point
0 comments1 min readLW link

I bet $500 on AI win­ning the IMO gold medal by 2026

azsantoskMay 11, 2023, 2:46 PM
37 points
29 comments1 min readLW link

Fate­book for Slack: Track your fore­casts, right where your team works

May 11, 2023, 2:11 PM
24 points
3 comments1 min readLW link

Con­tra Caller Signs

jefftkMay 11, 2023, 1:10 PM
10 points
0 comments1 min readLW link
(www.jefftk.com)

Notes on the im­por­tance and im­ple­men­ta­tion of safety-first cog­ni­tive ar­chi­tec­tures for AI

Brendon_WongMay 11, 2023, 10:03 AM
3 points
0 comments3 min readLW link

A more grounded idea of AI risk

IknownothingMay 11, 2023, 9:48 AM
3 points
4 comments1 min readLW link

Separat­ing the “con­trol prob­lem” from the “al­ign­ment prob­lem”

Yi-YangMay 11, 2023, 9:41 AM
12 points
1 comment4 min readLW link

[Question] Is In­fra-Bayesi­anism Ap­pli­ca­ble to Value Learn­ing?

RogerDearnaleyMay 11, 2023, 8:17 AM
5 points
4 comments1 min readLW link

[Question] How should we think about the de­ci­sion rele­vance of mod­els es­ti­mat­ing p(doom)?

Mo PuteraMay 11, 2023, 4:16 AM
11 points
1 comment3 min readLW link

The Aca­demic Field Pyra­mid—any point to en­courag­ing broad but shal­low AI risk en­gage­ment?

Matthew_OpitzMay 11, 2023, 1:32 AM
20 points
1 comment6 min readLW link

[Question] How should one feel morally about us­ing chat­bots?

Adam ZernerMay 11, 2023, 1:01 AM
18 points
4 comments1 min readLW link

[Question] AI in­ter­pretabil­ity could be harm­ful?

Roman LeventovMay 10, 2023, 8:43 PM
13 points
2 comments1 min readLW link

Athens, Greece – ACX Mee­tups Every­where Spring 2023

Spyros DovasMay 10, 2023, 7:45 PM
1 point
0 comments1 min readLW link

Bet­ter debates

TsviBTMay 10, 2023, 7:34 PM
78 points
7 comments3 min readLW link

Men­tal Health and the Align­ment Prob­lem: A Com­pila­tion of Re­sources (up­dated April 2023)

May 10, 2023, 7:04 PM
256 points
54 comments21 min readLW link

A Cor­rigi­bil­ity Me­taphore—Big Gambles

WCargoMay 10, 2023, 6:13 PM
16 points
0 comments4 min readLW link

Roadmap for a col­lab­o­ra­tive pro­to­type of an Open Agency Architecture

Deger TuranMay 10, 2023, 5:41 PM
31 points
0 comments12 min readLW link

AGI-Au­to­mated In­ter­pretabil­ity is Suicide

__RicG__May 10, 2023, 2:20 PM
25 points
33 comments7 min readLW link

Class-Based Addressing

jefftkMay 10, 2023, 1:40 PM
22 points
6 comments1 min readLW link
(www.jefftk.com)

In defence of epistemic mod­esty [dis­til­la­tion]

LuiseMay 10, 2023, 9:44 AM
17 points
2 comments9 min readLW link

[Question] How much of a con­cern are open-source LLMs in the short, medium and long terms?

JavierCCMay 10, 2023, 9:14 AM
5 points
0 comments1 min readLW link

10 great rea­sons why Lex Frid­man should in­vite Eliezer and Robin to re-do the FOOM de­bate on his podcast

chaosmageMay 10, 2023, 8:27 AM
−7 points
1 comment1 min readLW link
(www.reddit.com)

New OpenAI Paper—Lan­guage mod­els can ex­plain neu­rons in lan­guage models

MrThinkMay 10, 2023, 7:46 AM
47 points
14 comments1 min readLW link

Nat­u­ral­ist Experimentation

LoganStrohlMay 10, 2023, 4:28 AM
62 points
14 comments10 min readLW link

[Question] Could A Su­per­in­tel­li­gence Out-Ar­gue A Doomer?

tjaffee10 May 2023 2:40 UTC
−16 points
6 comments1 min readLW link

Gra­di­ent hack­ing via ac­tual hacking

Max H10 May 2023 1:57 UTC
12 points
7 comments3 min readLW link

Red team­ing: challenges and re­search directions

joshc10 May 2023 1:40 UTC
31 points
1 comment10 min readLW link