An­nounc­ing Man­i­fund Regrants

Austin Chen5 Jul 2023 19:42 UTC
74 points
8 comments1 min readLW link

In­fra-Bayesian Logic

5 Jul 2023 19:16 UTC
15 points
2 comments1 min readLW link

[Linkpost] In­tro­duc­ing Superalignment

beren5 Jul 2023 18:23 UTC
173 points
68 comments1 min readLW link
(openai.com)

If you wish to make an ap­ple pie, you must first be­come dic­ta­tor of the universe

jasoncrawford5 Jul 2023 18:14 UTC
27 points
9 comments13 min readLW link
(rootsofprogress.org)

An AGI kill switch with defined se­cu­rity properties

Peterpiper5 Jul 2023 17:40 UTC
−5 points
6 comments1 min readLW link

The risk-re­ward trade­off of in­ter­pretabil­ity research

5 Jul 2023 17:05 UTC
15 points
1 comment6 min readLW link

(ten­ta­tively) Found 600+ Monose­man­tic Fea­tures in a Small LM Us­ing Sparse Autoencoders

Logan Riggs5 Jul 2023 16:49 UTC
58 points
1 comment7 min readLW link

[Question] What did AI Safety’s spe­cific fund­ing of AGI R&D labs lead to?

Remmelt5 Jul 2023 15:51 UTC
3 points
0 comments1 min readLW link

AISN #13: An in­ter­dis­ci­plinary per­spec­tive on AI proxy failures, new com­peti­tors to ChatGPT, and prompt­ing lan­guage mod­els to misbehave

Dan H5 Jul 2023 15:33 UTC
13 points
0 comments1 min readLW link

Ex­plor­ing Func­tional De­ci­sion The­ory (FDT) and a mod­ified ver­sion (ModFDT)

MiguelDev5 Jul 2023 14:06 UTC
8 points
11 comments15 min readLW link

Op­ti­mized for Some­thing other than Win­ning or: How Cricket Re­sists Moloch and Good­hart’s Law

A.H.5 Jul 2023 12:33 UTC
53 points
25 comments4 min readLW link

Puffer-pope re­al­ity check

Neil 5 Jul 2023 9:27 UTC
20 points
2 comments1 min readLW link

Fi­nal Light­speed Grants cowork­ing/​office hours be­fore the ap­pli­ca­tion deadline

habryka5 Jul 2023 6:03 UTC
13 points
2 comments1 min readLW link

MXR Talk­box Cap?

jefftk5 Jul 2023 1:50 UTC
9 points
0 comments1 min readLW link
(www.jefftk.com)

“Reifi­ca­tion”

herschel5 Jul 2023 0:53 UTC
11 points
4 comments2 min readLW link

Dom­i­nant As­surance Con­tract Ex­per­i­ment #2: Berkeley House Dinners

Arjun Panickssery5 Jul 2023 0:13 UTC
60 points
8 comments1 min readLW link
(arjunpanickssery.substack.com)

Three camps in AI x-risk dis­cus­sions: My per­sonal very over­sim­plified overview

Aryeh Englander4 Jul 2023 20:42 UTC
21 points
0 comments1 min readLW link

Six (and a half) in­tu­itions for SVD

CallumMcDougall4 Jul 2023 19:23 UTC
66 points
1 comment1 min readLW link

An­i­mal Weapons: Les­sons for Hu­mans in the Age of X-Risk

Damin Curtis4 Jul 2023 18:14 UTC
3 points
0 comments10 min readLW link

Apoca­lypse Prep­ping—Con­cise SHTF guide to pre­pare for AGI doomsday

prepper4 Jul 2023 17:41 UTC
−8 points
9 comments1 min readLW link
(prepper.i2phides.me)

Ways I Ex­pect AI Reg­u­la­tion To In­crease Ex­tinc­tion Risk

1a3orn4 Jul 2023 17:32 UTC
215 points
32 comments7 min readLW link

AI labs’ state­ments on governance

Zach Stein-Perlman4 Jul 2023 16:30 UTC
30 points
0 comments36 min readLW link

AIs teams will prob­a­bly be more su­per­in­tel­li­gent than in­di­vi­d­ual AIs

Robert_AIZI4 Jul 2023 14:06 UTC
3 points
1 comment2 min readLW link
(aizi.substack.com)

What I Think About When I Think About History

Jacob G-W4 Jul 2023 14:02 UTC
2 points
4 comments3 min readLW link
(g-w1.github.io)

My Time As A Goddess

Evenstar4 Jul 2023 13:14 UTC
26 points
5 comments6 min readLW link

Twit­ter Twitches

Zvi4 Jul 2023 13:00 UTC
34 points
9 comments7 min readLW link
(thezvi.wordpress.com)

Ra­tional Unilat­er­al­ists Aren’t So Cursed

Sami Petersen4 Jul 2023 12:19 UTC
44 points
5 comments1 min readLW link

[Question] The liter­a­ture on alu­minum ad­ju­vants is very sus­pi­cious. Small IQ tax is plau­si­ble—can any ex­perts help me es­ti­mate it?

mikes4 Jul 2023 9:33 UTC
58 points
39 comments3 min readLW link

Two Per­co­la­tion Puzzles

Adam Scherlis4 Jul 2023 5:34 UTC
43 points
14 comments1 min readLW link
(adam.scherlis.com)

Mechanis­tic In­ter­pretabil­ity is Be­ing Pur­sued for the Wrong Reasons

Cole Wyeth4 Jul 2023 2:17 UTC
7 points
0 comments7 min readLW link
(colewyeth.com)

Should you an­nounce your bets pub­li­cly?

Ege Erdil4 Jul 2023 0:11 UTC
15 points
1 comment4 min readLW link

Ten Levels of AI Align­ment Difficulty

Sammy Martin3 Jul 2023 20:20 UTC
112 points
12 comments12 min readLW link

Se­cu­rity, Cryp­tograhy AI Work­shop in SF

Allison Duettmann3 Jul 2023 19:01 UTC
7 points
0 comments1 min readLW link

[Question] What in your opinion is the biggest open prob­lem in AI al­ign­ment?

tailcalled3 Jul 2023 16:34 UTC
39 points
35 comments1 min readLW link

A Sub­tle Selec­tion Effect in Over­con­fi­dence Studies

Kevin Dorst3 Jul 2023 14:43 UTC
24 points
0 comments6 min readLW link
(kevindorst.substack.com)

Monthly Roundup #8: July 2023

Zvi3 Jul 2023 13:20 UTC
40 points
4 comments46 min readLW link
(thezvi.wordpress.com)

Com­plex Signs Bad

Evenstar3 Jul 2023 13:09 UTC
5 points
2 comments3 min readLW link

6/​23

Celer3 Jul 2023 6:30 UTC
8 points
0 comments10 min readLW link
(keller.substack.com)

Marginal charity

Pat Myron3 Jul 2023 2:13 UTC
3 points
1 comment1 min readLW link

My Cen­tral Align­ment Pri­or­ity (2 July 2023)

NicholasKross3 Jul 2023 1:46 UTC
12 points
1 comment3 min readLW link

My Align­ment Timeline

NicholasKross3 Jul 2023 1:04 UTC
22 points
0 comments2 min readLW link

Dou­glas Hofs­tadter changes his mind on Deep Learn­ing & AI risk (June 2023)?

gwern3 Jul 2023 0:48 UTC
411 points
54 comments7 min readLW link
(www.youtube.com)

Frames in context

Richard_Ngo3 Jul 2023 0:38 UTC
39 points
9 comments6 min readLW link

Meta-ra­tio­nal­ity and frames

Richard_Ngo3 Jul 2023 0:33 UTC
63 points
2 comments5 min readLW link

VC The­ory Overview

Joar Skalse2 Jul 2023 22:45 UTC
10 points
2 comments11 min readLW link

Sources of ev­i­dence in Alignment

Martín Soto2 Jul 2023 20:38 UTC
20 points
0 comments11 min readLW link

Quan­ti­ta­tive cruxes in Alignment

Martín Soto2 Jul 2023 20:38 UTC
21 points
0 comments23 min readLW link

Go­ing Crazy and Get­ting Bet­ter Again

Evenstar2 Jul 2023 18:55 UTC
118 points
10 comments7 min readLW link

Shall We Throw A Huge Party Be­fore AGI Bids Us Adieu?

GeorgeMan2 Jul 2023 17:56 UTC
−1 points
6 comments1 min readLW link

Why it’s so hard to talk about Consciousness

Rafael Harth2 Jul 2023 15:56 UTC
87 points
152 comments9 min readLW link