Re­solve Cycles

CFAR!Duncan16 Jul 2022 23:17 UTC
134 points
8 comments10 min readLW link

Align­ment as Game Design

Shoshannah Tekofsky16 Jul 2022 22:36 UTC
11 points
7 comments2 min readLW link

Risk Man­age­ment from a Clim­bers Perspective

Annapurna16 Jul 2022 21:14 UTC
5 points
0 comments6 min readLW link
(jorgevelez.substack.com)

Cog­ni­tive In­sta­bil­ity, Phys­i­cal­ism, and Free Will

dadadarren16 Jul 2022 13:13 UTC
5 points
27 comments2 min readLW link
(www.sleepingbeautyproblem.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

16 Jul 2022 12:57 UTC
84 points
132 comments3 min readLW link

QNR Prospects

PeterMcCluskey16 Jul 2022 2:03 UTC
40 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

To-do waves

Paweł Sysiak16 Jul 2022 1:19 UTC
3 points
0 comments3 min readLW link

Money­pump­ing Bryan Ca­plan’s Belief in Free Will

Morpheus16 Jul 2022 0:46 UTC
5 points
9 comments1 min readLW link

A sum­mary of ev­ery “High­lights from the Se­quences” post

Akash15 Jul 2022 23:01 UTC
94 points
7 comments17 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan Vendrov15 Jul 2022 21:47 UTC
102 points
18 comments6 min readLW link

Com­fort Zone Exploration

CFAR!Duncan15 Jul 2022 21:18 UTC
49 points
2 comments12 min readLW link

A time-in­var­i­ant ver­sion of Laplace’s rule

15 Jul 2022 19:28 UTC
72 points
13 comments17 min readLW link
(epochai.org)

An at­tempt to break cir­cu­lar­ity in science

fryolysis15 Jul 2022 18:32 UTC
3 points
5 comments1 min readLW link

A story about a du­plic­i­tous API

LiLiLi15 Jul 2022 18:26 UTC
2 points
0 comments1 min readLW link

High­lights from the mem­o­irs of Van­nevar Bush

jasoncrawford15 Jul 2022 18:08 UTC
11 points
0 comments13 min readLW link
(rootsofprogress.org)

Notes on Learn­ing the Prior

Spencer Becker-Kahn15 Jul 2022 17:28 UTC
22 points
2 comments25 min readLW link

Re­view of The Eng­ines of Cognition

William Gasarch15 Jul 2022 14:13 UTC
13 points
5 comments15 min readLW link

A re­view of Nate Hilger’s The Par­ent Trap

David Hugh-Jones15 Jul 2022 9:30 UTC
15 points
8 comments4 min readLW link
(wyclif.substack.com)

Mus­ings on the Hu­man Ob­jec­tive Function

Michael Soareverix15 Jul 2022 7:13 UTC
3 points
0 comments3 min readLW link

Peter Singer’s first pub­lished piece on AI

Fai15 Jul 2022 6:18 UTC
20 points
5 comments1 min readLW link
(link.springer.com)

Don’t use ‘in­fo­haz­ard’ for col­lec­tively de­struc­tive info

Eliezer Yudkowsky15 Jul 2022 5:13 UTC
85 points
33 comments1 min readLW link2 reviews
(www.facebook.com)

Up­com­ing heat­wave: advice

stavros15 Jul 2022 5:03 UTC
16 points
13 comments3 min readLW link

A note about differ­en­tial tech­nolog­i­cal development

So8res15 Jul 2022 4:46 UTC
196 points
32 comments6 min readLW link

In­ward and out­ward steelmanning

Q Home14 Jul 2022 23:32 UTC
13 points
6 comments18 min readLW link

Po­tato diet: A post mortem and an an­swer to SMTM’s article

Épiphanie Gédéon14 Jul 2022 23:18 UTC
47 points
34 comments16 min readLW link

Pro­posed Orthog­o­nal­ity Th­e­ses #2-5

rjbg14 Jul 2022 22:59 UTC
8 points
0 comments2 min readLW link

Bet­ter Quiddler

jefftk14 Jul 2022 17:40 UTC
17 points
0 comments1 min readLW link
(www.jefftk.com)

Cir­cum­vent­ing in­ter­pretabil­ity: How to defeat mind-readers

Lee Sharkey14 Jul 2022 16:59 UTC
112 points
12 comments33 min readLW link

Covid 7/​14/​22: BA.2.75 Plus Tax

Zvi14 Jul 2022 14:40 UTC
39 points
9 comments8 min readLW link
(thezvi.wordpress.com)

Crit­i­cism of EA Crit­i­cism Contest

Zvi14 Jul 2022 14:30 UTC
108 points
17 comments31 min readLW link1 review
(thezvi.wordpress.com)

Hu­mans provide an un­tapped wealth of ev­i­dence about alignment

14 Jul 2022 2:31 UTC
197 points
94 comments9 min readLW link1 review

[Question] Wacky, risky, anti-in­duc­tive in­tel­li­gence-en­hance­ment meth­ods?

NicholasKross14 Jul 2022 1:40 UTC
19 points
27 comments1 min readLW link

[Question] How to im­press stu­dents with re­cent ad­vances in ML?

Charbel-Raphaël14 Jul 2022 0:03 UTC
12 points
2 comments1 min readLW link

Notes on Love

David Gross13 Jul 2022 23:35 UTC
17 points
3 comments29 min readLW link

Deep learn­ing cur­ricu­lum for large lan­guage model alignment

Jacob_Hilton13 Jul 2022 21:58 UTC
57 points
3 comments1 min readLW link
(github.com)

Ar­tifi­cial Sand­wich­ing: When can we test scal­able al­ign­ment pro­to­cols with­out hu­mans?

Sam Bowman13 Jul 2022 21:14 UTC
41 points
6 comments5 min readLW link

[Question] Any tips for elic­it­ing one’s own la­tent knowl­edge?

MSRayne13 Jul 2022 21:12 UTC
16 points
20 comments2 min readLW link

Goal Align­ment Is Ro­bust To the Sharp Left Turn

Thane Ruthenis13 Jul 2022 20:23 UTC
47 points
16 comments4 min readLW link

Mak­ing de­ci­sions us­ing mul­ti­ple worldviews

Richard_Ngo13 Jul 2022 19:15 UTC
50 points
10 comments11 min readLW link

[Question] App idea to help with read­ing STEM text­books (feed­back re­quest)

DirectedEvolution13 Jul 2022 18:28 UTC
16 points
8 comments2 min readLW link

MIRI Con­ver­sa­tions: Tech­nol­ogy Fore­cast­ing & Grad­u­al­ism (Distil­la­tion)

CallumMcDougall13 Jul 2022 15:55 UTC
31 points
1 comment20 min readLW link

Pass­ing Up Pay

jefftk13 Jul 2022 14:10 UTC
29 points
8 comments5 min readLW link
(www.jefftk.com)

[Question] How could the uni­verse be in­finitely large?

amarai13 Jul 2022 13:45 UTC
0 points
8 comments1 min readLW link

John von Neu­mann on how to safely progress with technology

Dalton Mabery13 Jul 2022 11:07 UTC
14 points
0 comments1 min readLW link

Every­one is an Im­poster

Tharin13 Jul 2022 8:46 UTC
19 points
1 comment9 min readLW link
(echoesandchimes.com)

[Question] Which AI Safety re­search agen­das are the most promis­ing?

Chris_Leong13 Jul 2022 7:54 UTC
27 points
5 comments1 min readLW link

Straw-Steelmanning

Chris van Merwijk13 Jul 2022 5:48 UTC
30 points
2 comments1 min readLW link

Alien Mes­sage Con­test: Solution

DaemonicSigil13 Jul 2022 4:07 UTC
29 points
2 comments4 min readLW link

[Question] What is wrong with this ap­proach to cor­rigi­bil­ity?

Rafael Cosman12 Jul 2022 22:55 UTC
7 points
8 comments1 min readLW link

Ac­cept­abil­ity Ver­ifi­ca­tion: A Re­search Agenda

12 Jul 2022 20:11 UTC
50 points
0 comments1 min readLW link
(docs.google.com)