Defin­ing Op­ti­miza­tion in a Deeper Way Part 3

J Bostock20 Jul 2022 22:06 UTC
8 points
0 comments2 min readLW link

Cog­ni­tive Risks of Ado­les­cent Binge Drinking

20 Jul 2022 21:10 UTC
70 points
12 comments10 min readLW link
(acesounderglass.com)

Why AGI Timeline Re­search/​Dis­course Might Be Overrated

Noosphere8920 Jul 2022 20:26 UTC
5 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

En­light­en­ment Values in a Vuln­er­a­ble World

Maxwell Tabarrok20 Jul 2022 19:52 UTC
15 points
6 comments31 min readLW link
(maximumprogress.substack.com)

Coun­ter­ing ar­gu­ments against work­ing on AI safety

Rauno Arike20 Jul 2022 18:23 UTC
7 points
2 comments7 min readLW link

A Short In­tro to Humans

Ben Amitay20 Jul 2022 15:28 UTC
1 point
1 comment7 min readLW link

How to Diver­sify Con­cep­tual Align­ment: the Model Be­hind Refine

adamShimi20 Jul 2022 10:44 UTC
87 points
11 comments8 min readLW link

[Question] What are the sim­plest ques­tions in ap­plied ra­tio­nal­ity where you don’t know the an­swer to?

ChristianKl20 Jul 2022 9:53 UTC
26 points
11 comments1 min readLW link

AI Safety Cheat­sheet /​ Quick Reference

Zohar Jackson20 Jul 2022 9:39 UTC
3 points
0 comments1 min readLW link
(github.com)

Get­ting Un­stuck on Counterfactuals

Chris_Leong20 Jul 2022 5:31 UTC
7 points
1 comment2 min readLW link

Pit­falls with Proofs

scasper19 Jul 2022 22:21 UTC
19 points
21 comments8 min readLW link

A daily rou­tine I do for my AI safety re­search work

scasper19 Jul 2022 21:58 UTC
21 points
7 comments1 min readLW link

Progress links and tweets, 2022-07-19

jasoncrawford19 Jul 2022 20:50 UTC
11 points
1 comment1 min readLW link
(rootsofprogress.org)

Ap­pli­ca­tions are open for CFAR work­shops in Prague this fall!

John Steidley19 Jul 2022 18:29 UTC
64 points
3 comments2 min readLW link

Sex­ual Abuse at­ti­tudes might be infohazardous

Pseudonymous Otter19 Jul 2022 18:06 UTC
254 points
71 comments1 min readLW link

Spend­ing Up­date 2022

jefftk19 Jul 2022 14:10 UTC
28 points
0 comments3 min readLW link
(www.jefftk.com)

Abram Dem­ski’s ELK thoughts and pro­posal—distillation

Rubi J. Hudson19 Jul 2022 6:57 UTC
16 points
8 comments16 min readLW link

Bounded com­plex­ity of solv­ing ELK and its implications

Rubi J. Hudson19 Jul 2022 6:56 UTC
11 points
4 comments18 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth Barnes19 Jul 2022 4:55 UTC
95 points
6 comments2 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCeph19 Jul 2022 2:28 UTC
9 points
1 comment9 min readLW link

Ars D&D.Sci: Mys­ter­ies of Mana Eval­u­a­tion & Ruleset

aphyer19 Jul 2022 2:06 UTC
30 points
4 comments5 min readLW link

Mar­burg Virus Pan­demic Pre­dic­tion Checklist

DirectedEvolution18 Jul 2022 23:15 UTC
30 points
0 comments5 min readLW link

At what point will we know if Eliezer’s pre­dic­tions are right or wrong?

anonymous12345618 Jul 2022 22:06 UTC
5 points
6 comments1 min readLW link

Model­ling Deception

Garrett Baker18 Jul 2022 21:21 UTC
15 points
0 comments7 min readLW link

Are In­tel­li­gence and Gen­er­al­ity Orthog­o­nal?

cubefox18 Jul 2022 20:07 UTC
18 points
16 comments1 min readLW link

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC
364 points
94 comments75 min readLW link1 review

Turn­ing Some In­con­sis­tent Prefer­ences into Con­sis­tent Ones

niplav18 Jul 2022 18:40 UTC
23 points
5 comments12 min readLW link

Ad­den­dum: A non-mag­i­cal ex­pla­na­tion of Jeffrey Epstein

lc18 Jul 2022 17:40 UTC
80 points
21 comments11 min readLW link

Launch­ing a new progress in­sti­tute, seek­ing a CEO

jasoncrawford18 Jul 2022 16:58 UTC
25 points
2 comments3 min readLW link
(rootsofprogress.org)

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo Villalobos18 Jul 2022 16:51 UTC
20 points
0 comments1 min readLW link
(epochai.org)

Quan­tiliz­ers and Gen­er­a­tive Models

Adam Jermyn18 Jul 2022 16:32 UTC
24 points
5 comments4 min readLW link

AI Hiroshima (Does A Vivid Ex­am­ple Of Destruc­tion Fore­stall Apoca­lypse?)

Sable18 Jul 2022 12:06 UTC
4 points
4 comments2 min readLW link

How the ---- did Feyn­man Get Here !?

George3d618 Jul 2022 9:43 UTC
8 points
8 comments3 min readLW link
(www.epistem.ink)

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
58 points
8 comments20 min readLW link

Train­ing goals for large lan­guage models

Johannes Treutlein18 Jul 2022 7:09 UTC
28 points
5 comments19 min readLW link

A dis­til­la­tion of Evan Hub­inger’s train­ing sto­ries (for SERI MATS)

Daphne_W18 Jul 2022 3:38 UTC
15 points
1 comment10 min readLW link

Fore­cast­ing ML Bench­marks in 2023

jsteinhardt18 Jul 2022 2:50 UTC
36 points
20 comments12 min readLW link
(bounded-regret.ghost.io)

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamon18 Jul 2022 1:11 UTC
329 points
60 comments6 min readLW link1 review

De­cep­tion?! I ain’t got time for that!

Paul Colognese18 Jul 2022 0:06 UTC
55 points
5 comments13 min readLW link

How In­ter­pretabil­ity can be Impactful

Connall Garrod18 Jul 2022 0:06 UTC
18 points
0 comments37 min readLW link

Why you might ex­pect ho­mo­ge­neous take-off: ev­i­dence from ML research

Andrei Alexandru17 Jul 2022 20:31 UTC
24 points
0 comments10 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

ThomasW17 Jul 2022 20:06 UTC
107 points
14 comments1 min readLW link

Four ques­tions I ask AI safety researchers

Akash17 Jul 2022 17:25 UTC
17 points
0 comments1 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirk17 Jul 2022 17:04 UTC
14 points
6 comments1 min readLW link

Cul­ture wars in rid­dle format

Malmesbury17 Jul 2022 14:51 UTC
7 points
28 comments3 min readLW link

Ban­ga­lore LW/​ACX Meetup in person

Vyakart17 Jul 2022 6:53 UTC
1 point
0 comments1 min readLW link

Re­solve Cycles

CFAR!Duncan16 Jul 2022 23:17 UTC
134 points
8 comments10 min readLW link

Align­ment as Game Design

Shoshannah Tekofsky16 Jul 2022 22:36 UTC
11 points
7 comments2 min readLW link

Risk Man­age­ment from a Clim­bers Perspective

Annapurna16 Jul 2022 21:14 UTC
5 points
0 comments6 min readLW link
(jorgevelez.substack.com)

Cog­ni­tive In­sta­bil­ity, Phys­i­cal­ism, and Free Will

dadadarren16 Jul 2022 13:13 UTC
5 points
27 comments2 min readLW link
(www.sleepingbeautyproblem.com)