Pit­falls with Proofs

scasper19 Jul 2022 22:21 UTC
19 points
21 comments8 min readLW link

A daily rou­tine I do for my AI safety re­search work

scasper19 Jul 2022 21:58 UTC
21 points
7 comments1 min readLW link

Progress links and tweets, 2022-07-19

jasoncrawford19 Jul 2022 20:50 UTC
11 points
1 comment1 min readLW link
(rootsofprogress.org)

Ap­pli­ca­tions are open for CFAR work­shops in Prague this fall!

John Steidley19 Jul 2022 18:29 UTC
64 points
3 comments2 min readLW link

Sex­ual Abuse at­ti­tudes might be infohazardous

Pseudonymous Otter19 Jul 2022 18:06 UTC
254 points
71 comments1 min readLW link

Spend­ing Up­date 2022

jefftk19 Jul 2022 14:10 UTC
28 points
0 comments3 min readLW link
(www.jefftk.com)

Abram Dem­ski’s ELK thoughts and pro­posal—distillation

Rubi J. Hudson19 Jul 2022 6:57 UTC
16 points
8 comments16 min readLW link

Bounded com­plex­ity of solv­ing ELK and its implications

Rubi J. Hudson19 Jul 2022 6:56 UTC
11 points
4 comments18 min readLW link

Help ARC eval­u­ate ca­pa­bil­ities of cur­rent lan­guage mod­els (still need peo­ple)

Beth Barnes19 Jul 2022 4:55 UTC
95 points
6 comments2 min readLW link

A Cri­tique of AI Align­ment Pessimism

ExCeph19 Jul 2022 2:28 UTC
9 points
1 comment9 min readLW link

Ars D&D.Sci: Mys­ter­ies of Mana Eval­u­a­tion & Ruleset

aphyer19 Jul 2022 2:06 UTC
30 points
4 comments5 min readLW link

Mar­burg Virus Pan­demic Pre­dic­tion Checklist

DirectedEvolution18 Jul 2022 23:15 UTC
30 points
0 comments5 min readLW link

At what point will we know if Eliezer’s pre­dic­tions are right or wrong?

anonymous12345618 Jul 2022 22:06 UTC
5 points
6 comments1 min readLW link

Model­ling Deception

Garrett Baker18 Jul 2022 21:21 UTC
15 points
0 comments7 min readLW link

Are In­tel­li­gence and Gen­er­al­ity Orthog­o­nal?

cubefox18 Jul 2022 20:07 UTC
18 points
16 comments1 min readLW link

Without spe­cific coun­ter­mea­sures, the eas­iest path to trans­for­ma­tive AI likely leads to AI takeover

Ajeya Cotra18 Jul 2022 19:06 UTC
364 points
94 comments75 min readLW link1 review

Turn­ing Some In­con­sis­tent Prefer­ences into Con­sis­tent Ones

niplav18 Jul 2022 18:40 UTC
23 points
5 comments12 min readLW link

Ad­den­dum: A non-mag­i­cal ex­pla­na­tion of Jeffrey Epstein

lc18 Jul 2022 17:40 UTC
80 points
21 comments11 min readLW link

Launch­ing a new progress in­sti­tute, seek­ing a CEO

jasoncrawford18 Jul 2022 16:58 UTC
25 points
2 comments3 min readLW link
(rootsofprogress.org)

Ma­chine Learn­ing Model Sizes and the Pa­ram­e­ter Gap [abridged]

Pablo Villalobos18 Jul 2022 16:51 UTC
20 points
0 comments1 min readLW link
(epochai.org)

Quan­tiliz­ers and Gen­er­a­tive Models

Adam Jermyn18 Jul 2022 16:32 UTC
24 points
5 comments4 min readLW link

AI Hiroshima (Does A Vivid Ex­am­ple Of Destruc­tion Fore­stall Apoca­lypse?)

Sable18 Jul 2022 12:06 UTC
4 points
4 comments2 min readLW link

How the ---- did Feyn­man Get Here !?

George3d618 Jul 2022 9:43 UTC
8 points
8 comments3 min readLW link
(www.epistem.ink)

Con­di­tion­ing Gen­er­a­tive Models for Alignment

Jozdien18 Jul 2022 7:11 UTC
58 points
8 comments20 min readLW link

Train­ing goals for large lan­guage models

Johannes Treutlein18 Jul 2022 7:09 UTC
28 points
5 comments19 min readLW link

A dis­til­la­tion of Evan Hub­inger’s train­ing sto­ries (for SERI MATS)

Daphne_W18 Jul 2022 3:38 UTC
15 points
1 comment10 min readLW link

Fore­cast­ing ML Bench­marks in 2023

jsteinhardt18 Jul 2022 2:50 UTC
36 points
20 comments12 min readLW link
(bounded-regret.ghost.io)

What should you change in re­sponse to an “emer­gency”? And AI risk

AnnaSalamon18 Jul 2022 1:11 UTC
329 points
60 comments6 min readLW link1 review

De­cep­tion?! I ain’t got time for that!

Paul Colognese18 Jul 2022 0:06 UTC
55 points
5 comments13 min readLW link

How In­ter­pretabil­ity can be Impactful

Connall Garrod18 Jul 2022 0:06 UTC
18 points
0 comments37 min readLW link

Why you might ex­pect ho­mo­ge­neous take-off: ev­i­dence from ML research

Andrei Alexandru17 Jul 2022 20:31 UTC
24 points
0 comments10 min readLW link

Ex­am­ples of AI In­creas­ing AI Progress

ThomasW17 Jul 2022 20:06 UTC
107 points
14 comments1 min readLW link

Four ques­tions I ask AI safety researchers

Akash17 Jul 2022 17:25 UTC
17 points
0 comments1 min readLW link

Why I Think Abrupt AI Takeoff

lincolnquirk17 Jul 2022 17:04 UTC
14 points
6 comments1 min readLW link

Cul­ture wars in rid­dle format

Malmesbury17 Jul 2022 14:51 UTC
7 points
28 comments3 min readLW link

Ban­ga­lore LW/​ACX Meetup in person

Vyakart17 Jul 2022 6:53 UTC
1 point
0 comments1 min readLW link

Re­solve Cycles

CFAR!Duncan16 Jul 2022 23:17 UTC
134 points
8 comments10 min readLW link

Align­ment as Game Design

Shoshannah Tekofsky16 Jul 2022 22:36 UTC
11 points
7 comments2 min readLW link

Risk Man­age­ment from a Clim­bers Perspective

Annapurna16 Jul 2022 21:14 UTC
5 points
0 comments6 min readLW link
(jorgevelez.substack.com)

Cog­ni­tive In­sta­bil­ity, Phys­i­cal­ism, and Free Will

dadadarren16 Jul 2022 13:13 UTC
5 points
27 comments2 min readLW link
(www.sleepingbeautyproblem.com)

All AGI safety ques­tions wel­come (es­pe­cially ba­sic ones) [July 2022]

16 Jul 2022 12:57 UTC
84 points
132 comments3 min readLW link

QNR Prospects

PeterMcCluskey16 Jul 2022 2:03 UTC
40 points
3 comments8 min readLW link
(www.bayesianinvestor.com)

To-do waves

Paweł Sysiak16 Jul 2022 1:19 UTC
3 points
0 comments3 min readLW link

Money­pump­ing Bryan Ca­plan’s Belief in Free Will

Morpheus16 Jul 2022 0:46 UTC
5 points
9 comments1 min readLW link

A sum­mary of ev­ery “High­lights from the Se­quences” post

Akash15 Jul 2022 23:01 UTC
94 points
7 comments17 min readLW link

Safety Im­pli­ca­tions of LeCun’s path to ma­chine intelligence

Ivan Vendrov15 Jul 2022 21:47 UTC
102 points
18 comments6 min readLW link

Com­fort Zone Exploration

CFAR!Duncan15 Jul 2022 21:18 UTC
49 points
2 comments12 min readLW link

A time-in­var­i­ant ver­sion of Laplace’s rule

15 Jul 2022 19:28 UTC
72 points
13 comments17 min readLW link
(epochai.org)

An at­tempt to break cir­cu­lar­ity in science

fryolysis15 Jul 2022 18:32 UTC
3 points
5 comments1 min readLW link

A story about a du­plic­i­tous API

LiLiLi15 Jul 2022 18:26 UTC
2 points
0 comments1 min readLW link