An­nounc­ing Man­i­fund Regrants

Austin ChenJul 5, 2023, 7:42 PM
74 points
8 commentsLW link

In­fra-Bayesian Logic

Jul 5, 2023, 7:16 PM
15 points
2 comments1 min readLW link

[Linkpost] In­tro­duc­ing Superalignment

berenJul 5, 2023, 6:23 PM
175 points
69 comments1 min readLW link
(openai.com)

If you wish to make an ap­ple pie, you must first be­come dic­ta­tor of the universe

jasoncrawfordJul 5, 2023, 6:14 PM
27 points
9 comments13 min readLW link
(rootsofprogress.org)

An AGI kill switch with defined se­cu­rity properties

PeterpiperJul 5, 2023, 5:40 PM
−5 points
6 comments1 min readLW link

The risk-re­ward trade­off of in­ter­pretabil­ity research

Jul 5, 2023, 5:05 PM
15 points
1 comment6 min readLW link

(ten­ta­tively) Found 600+ Monose­man­tic Fea­tures in a Small LM Us­ing Sparse Autoencoders

Logan RiggsJul 5, 2023, 4:49 PM
60 points
1 comment7 min readLW link

[Question] What did AI Safety’s spe­cific fund­ing of AGI R&D labs lead to?

RemmeltJul 5, 2023, 3:51 PM
3 points
0 commentsLW link

AISN #13: An in­ter­dis­ci­plinary per­spec­tive on AI proxy failures, new com­peti­tors to ChatGPT, and prompt­ing lan­guage mod­els to misbehave

Dan HJul 5, 2023, 3:33 PM
13 points
0 commentsLW link

Ex­plor­ing Func­tional De­ci­sion The­ory (FDT) and a mod­ified ver­sion (ModFDT)

MiguelDevJul 5, 2023, 2:06 PM
11 points
11 comments15 min readLW link

Op­ti­mized for Some­thing other than Win­ning or: How Cricket Re­sists Moloch and Good­hart’s Law

A.H.Jul 5, 2023, 12:33 PM
53 points
26 comments4 min readLW link

Puffer-pope re­al­ity check

Neil Jul 5, 2023, 9:27 AM
20 points
2 comments1 min readLW link

Fi­nal Light­speed Grants cowork­ing/​office hours be­fore the ap­pli­ca­tion deadline

habrykaJul 5, 2023, 6:03 AM
13 points
2 comments1 min readLW link

MXR Talk­box Cap?

jefftkJul 5, 2023, 1:50 AM
9 points
0 comments1 min readLW link
(www.jefftk.com)

“Reifi­ca­tion”

herschelJul 5, 2023, 12:53 AM
11 points
4 comments2 min readLW link

Dom­i­nant As­surance Con­tract Ex­per­i­ment #2: Berkeley House Dinners

Arjun PanicksseryJul 5, 2023, 12:13 AM
51 points
8 comments1 min readLW link
(arjunpanickssery.substack.com)

Three camps in AI x-risk dis­cus­sions: My per­sonal very over­sim­plified overview

Aryeh EnglanderJul 4, 2023, 8:42 PM
21 points
0 commentsLW link

Six (and a half) in­tu­itions for SVD

CallumMcDougallJul 4, 2023, 7:23 PM
71 points
1 comment1 min readLW link

An­i­mal Weapons: Les­sons for Hu­mans in the Age of X-Risk

Damin CurtisJul 4, 2023, 6:14 PM
4 points
0 comments10 min readLW link

Apoca­lypse Prep­ping—Con­cise SHTF guide to pre­pare for AGI doomsday

prepperJul 4, 2023, 5:41 PM
−7 points
9 comments1 min readLW link
(prepper.i2phides.me)

Ways I Ex­pect AI Reg­u­la­tion To In­crease Ex­tinc­tion Risk

1a3ornJul 4, 2023, 5:32 PM
226 points
32 comments7 min readLW link

AI labs’ state­ments on governance

Zach Stein-PerlmanJul 4, 2023, 4:30 PM
30 points
0 comments36 min readLW link

AIs teams will prob­a­bly be more su­per­in­tel­li­gent than in­di­vi­d­ual AIs

Robert_AIZIJul 4, 2023, 2:06 PM
3 points
1 comment2 min readLW link
(aizi.substack.com)

What I Think About When I Think About History

Jacob G-WJul 4, 2023, 2:02 PM
3 points
4 comments3 min readLW link
(g-w1.github.io)

My Time As A Goddess

EvenstarJul 4, 2023, 1:14 PM
30 points
5 comments6 min readLW link

Twit­ter Twitches

ZviJul 4, 2023, 1:00 PM
34 points
9 comments7 min readLW link
(thezvi.wordpress.com)

Ra­tional Unilat­er­al­ists Aren’t So Cursed

SCPJul 4, 2023, 12:19 PM
56 points
6 comments6 min readLW link1 review

[Question] The liter­a­ture on alu­minum ad­ju­vants is very sus­pi­cious. Small IQ tax is plau­si­ble—can any ex­perts help me es­ti­mate it?

mikesJul 4, 2023, 9:33 AM
61 points
39 comments3 min readLW link

Two Per­co­la­tion Puzzles

Adam ScherlisJul 4, 2023, 5:34 AM
43 points
14 comments1 min readLW link
(adam.scherlis.com)

Mechanis­tic In­ter­pretabil­ity is Be­ing Pur­sued for the Wrong Reasons

Cole WyethJul 4, 2023, 2:17 AM
13 points
0 comments7 min readLW link
(colewyeth.com)

Should you an­nounce your bets pub­li­cly?

Ege ErdilJul 4, 2023, 12:11 AM
28 points
1 comment4 min readLW link

Ten Levels of AI Align­ment Difficulty

Sammy MartinJul 3, 2023, 8:20 PM
138 points
24 comments12 min readLW link1 review

Se­cu­rity, Cryp­tograhy AI Work­shop in SF

Allison DuettmannJul 3, 2023, 7:01 PM
7 points
0 comments1 min readLW link

[Question] What in your opinion is the biggest open prob­lem in AI al­ign­ment?

tailcalledJul 3, 2023, 4:34 PM
39 points
35 comments1 min readLW link

A Sub­tle Selec­tion Effect in Over­con­fi­dence Studies

Kevin DorstJul 3, 2023, 2:43 PM
24 points
0 comments6 min readLW link
(kevindorst.substack.com)

Monthly Roundup #8: July 2023

ZviJul 3, 2023, 1:20 PM
40 points
4 comments46 min readLW link
(thezvi.wordpress.com)

Com­plex Signs Bad

EvenstarJul 3, 2023, 1:09 PM
5 points
2 comments3 min readLW link

6/​23

CelerJul 3, 2023, 6:30 AM
8 points
0 comments10 min readLW link
(keller.substack.com)

Marginal charity

Pat MyronJul 3, 2023, 2:13 AM
3 points
1 commentLW link

My Cen­tral Align­ment Pri­or­ity (2 July 2023)

Nicholas / Heather KrossJul 3, 2023, 1:46 AM
12 points
1 comment3 min readLW link

My Align­ment Timeline

Nicholas / Heather KrossJul 3, 2023, 1:04 AM
22 points
0 comments2 min readLW link

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwernJul 3, 2023, 12:48 AM
426 points
54 comments7 min readLW link
(www.youtube.com)

Frames in context

Richard_NgoJul 3, 2023, 12:38 AM
39 points
9 comments6 min readLW link

Meta-ra­tio­nal­ity and frames

Richard_NgoJul 3, 2023, 12:33 AM
64 points
2 comments5 min readLW link

VC The­ory Overview

Joar SkalseJul 2, 2023, 10:45 PM
12 points
2 comments11 min readLW link

Sources of ev­i­dence in Alignment

Martín SotoJul 2, 2023, 8:38 PM
20 points
0 comments11 min readLW link

Quan­ti­ta­tive cruxes in Alignment

Martín SotoJul 2, 2023, 8:38 PM
19 points
0 comments23 min readLW link

Go­ing Crazy and Get­ting Bet­ter Again

EvenstarJul 2, 2023, 6:55 PM
139 points
13 comments7 min readLW link1 review

Shall We Throw A Huge Party Be­fore AGI Bids Us Adieu?

GeorgeManJul 2, 2023, 5:56 PM
−1 points
6 comments1 min readLW link

Why it’s so hard to talk about Consciousness

Rafael HarthJul 2, 2023, 3:56 PM
167 points
215 comments9 min readLW link3 reviews