A short calcu­la­tion about a Twit­ter poll

Ege Erdil14 Aug 2023 19:48 UTC
62 points
64 comments11 min readLW link

De­com­pos­ing in­de­pen­dent gen­er­al­iza­tions in neu­ral net­works via Hes­sian analysis

14 Aug 2023 17:04 UTC
82 points
3 comments1 min readLW link

Memetic Judo #2: In­cor­po­ral Switches and Lev­ers Compendium

Max TK14 Aug 2023 16:53 UTC
19 points
6 comments17 min readLW link

Longer-term Be­havi­our of Gen­er­a­tive Com­pan­ion AIs: A So­cial Si­mu­la­tion Approach

Reed14 Aug 2023 15:24 UTC
5 points
0 comments7 min readLW link

Ex­is­ten­tially rele­vant thought ex­per­i­ment: To kill or not to kill, a sniper, a man and a but­ton.

AlexFromSafeTransition14 Aug 2023 10:53 UTC
−18 points
6 comments4 min readLW link

Step­ping down as mod­er­a­tor on LW

Kaj_Sotala14 Aug 2023 10:46 UTC
82 points
1 comment1 min readLW link

An­nounc­ing Man­i­fest 2023 (Sep 22-24 in Berkeley)

14 Aug 2023 5:13 UTC
31 points
0 comments2 min readLW link

Co­her­ence Ther­apy with LLMs—quick demo

Chipmonk14 Aug 2023 3:34 UTC
19 points
11 comments1 min readLW link

Listen For What You Don’t Hear: The Case for Contrarianism

Yashvardhan Sharma14 Aug 2023 2:53 UTC
3 points
1 comment5 min readLW link

Recipe: Hes­sian eigen­vec­tor com­pu­ta­tion for PyTorch models

Nina Rimsky14 Aug 2023 2:48 UTC
30 points
5 comments5 min readLW link

[Question] As­sum­ing LK99 or similar: how to ac­cel­er­ate com­mer­cial­iza­tion?

ryan_b13 Aug 2023 21:34 UTC
7 points
5 comments1 min readLW link

Twin Cities ACX Meetup Septem­ber 2023

Timothy M.13 Aug 2023 20:10 UTC
1 point
4 comments1 min readLW link

Fun­da­men­tal Uncer­tainty: Chap­ter 1 - How can we know what’s true?

Gordon Seidoh Worley13 Aug 2023 18:55 UTC
17 points
4 comments12 min readLW link

We Should Pre­pare for a Larger Rep­re­sen­ta­tion of Academia in AI Safety

Leon Lang13 Aug 2023 18:03 UTC
89 points
13 comments5 min readLW link

AGI is eas­ier than robotaxis

Daniel Kokotajlo13 Aug 2023 17:00 UTC
38 points
30 comments4 min readLW link

[Question] If we’re al­ive in 5 years, do you think the fund­ing situ­a­tion will be much bet­ter by then? (With large amounts of gov­ern­ment fund­ing, for ex­am­ple)

kuira13 Aug 2023 16:32 UTC
−2 points
6 comments1 min readLW link

Ab­stract The­o­ries of Everything

Philosophistry13 Aug 2023 6:06 UTC
−17 points
0 comments1 min readLW link

[Linkpost] Per­sonal and Psy­cholog­i­cal Di­men­sions of AI Re­searchers Con­fronting AI Catas­trophic Risks

Bogdan Ionut Cirstea12 Aug 2023 22:02 UTC
42 points
0 comments1 min readLW link

The Em­pa­thy Eng­ine: A De­con­struc­tion of the So­cietal Me­ta­mor­pho­sis through Tech­nolog­i­cal Em­pa­thy Augmentation

bigdickproblems12 Aug 2023 18:23 UTC
−30 points
3 comments2 min readLW link

The Benev­olent Ruler’s Hand­book (Part 2): Mo­ral­ity Rules

FCCC12 Aug 2023 14:25 UTC
5 points
0 comments4 min readLW link

Learn­ing as you play: an­thropic shadow in deadly games

dr_s12 Aug 2023 7:34 UTC
37 points
28 comments35 min readLW link

Biolog­i­cal An­chors: The Trick that Might or Might Not Work

Scott Alexander12 Aug 2023 0:53 UTC
90 points
3 comments33 min readLW link
(astralcodexten.substack.com)

Si­mu­late the CEO

robotelvis12 Aug 2023 0:09 UTC
23 points
4 comments5 min readLW link
(messyprogress.substack.com)

How to de­cide un­der low-stakes uncertainty

dkl911 Aug 2023 18:07 UTC
10 points
4 comments1 min readLW link
(dkl9.net)

The Pan­demic is Only Begin­ning: The Long COVID Disaster

salvatore mattera11 Aug 2023 17:36 UTC
−6 points
15 comments8 min readLW link

When dis­cussing AI risks, talk about ca­pa­bil­ities, not intelligence

Vika11 Aug 2023 13:38 UTC
116 points
7 comments3 min readLW link
(vkrakovna.wordpress.com)

What are the flaws in this AGI ar­gu­ment?

William the Kiwi 11 Aug 2023 11:31 UTC
5 points
14 comments1 min readLW link

Google Deep­Mind’s RT-2

SandXbox11 Aug 2023 11:26 UTC
9 points
1 comment1 min readLW link
(robotics-transformer2.github.io)

Linkpost: We need an­other Ex­pert Sur­vey on Progress in AI, urgently

David Mears11 Aug 2023 8:22 UTC
25 points
2 comments2 min readLW link
(open.substack.com)

What Does a Marginal Grant at LTFF Look Like? Fund­ing Pri­ori­ties and Grant­mak­ing Thresh­olds at the Long-Term Fu­ture Fund

11 Aug 2023 3:59 UTC
64 points
0 comments1 min readLW link
(forum.effectivealtruism.org)

[Question] Will post­ing any thread on LW guaran­tee that a LLM will in­dex all my con­tent, and if ques­tions peo­ple ask to the LLM af­ter my name will sur­face up all my LW con­tent?

Alex K. Chen (parrot)11 Aug 2023 1:40 UTC
0 points
0 comments1 min readLW link

AI Safety Con­cepts Wri­teup: WebGPT

JustisMills11 Aug 2023 1:35 UTC
9 points
1 comment7 min readLW link

[Question] What is sci­ence?

Adam Zerner11 Aug 2023 0:00 UTC
6 points
4 comments1 min readLW link

Three con­figurable prettyprinters

philh10 Aug 2023 23:10 UTC
9 points
0 comments22 min readLW link
(reasonableapproximation.net)

Ilya Sutskever’s thoughts on AI safety (July 2023): a tran­script with my comments

mishka10 Aug 2023 19:07 UTC
21 points
3 comments5 min readLW link

Seek­ing In­put to AI Safety Book for non-tech­ni­cal audience

Darren McKee10 Aug 2023 17:58 UTC
10 points
4 comments1 min readLW link

Eval­u­at­ing GPT-4 The­ory of Mind Capabilities

10 Aug 2023 17:57 UTC
15 points
2 comments14 min readLW link

Some al­ign­ment ideas

SelonNerias10 Aug 2023 17:51 UTC
1 point
0 comments11 min readLW link

Self Su­per­vised Learn­ing (SSL)

Varshul Gupta10 Aug 2023 17:43 UTC
5 points
1 comment2 min readLW link
(dubverseblack.substack.com)

Pre­dict­ing Virus Rel­a­tive Abun­dance in Wastewater

jefftk10 Aug 2023 15:46 UTC
33 points
2 comments1 min readLW link
(naobservatory.org)

AI #24: Week of the Podcast

Zvi10 Aug 2023 15:00 UTC
49 points
5 comments44 min readLW link
(thezvi.wordpress.com)

Could We Au­to­mate AI Align­ment Re­search?

Stephen McAleese10 Aug 2023 12:17 UTC
27 points
10 comments21 min readLW link

The po­si­tional em­bed­ding ma­trix and pre­vi­ous-to­ken heads: how do they ac­tu­ally work?

AdamYedidia10 Aug 2023 1:58 UTC
26 points
4 comments13 min readLW link

LLMs are (mostly) not helped by filler tokens

Kshitij Sachan10 Aug 2023 0:48 UTC
64 points
35 comments6 min readLW link

2023 ACX Mee­tups Every­where—New­ton, MA

duck_master9 Aug 2023 22:47 UTC
6 points
2 comments1 min readLW link

Progress links di­gest, 2023-08-09: US adds new nu­clear, Katalin Kar­ikó in­ter­view, and more

jasoncrawford9 Aug 2023 19:22 UTC
18 points
0 comments3 min readLW link
(rootsofprogress.org)

Mech In­terp Challenge: Au­gust—De­ci­pher­ing the First Unique Char­ac­ter Model

CallumMcDougall9 Aug 2023 19:14 UTC
34 points
1 comment3 min readLW link

Real Mean­ing of life has been found. Eliezer dis­cov­ered it in 2000′s.

Jorterder9 Aug 2023 18:13 UTC
−15 points
1 comment1 min readLW link
(docs.google.com)

Marginal Revolu­tion un­offi­cial birth­day party

Derek M. Jones9 Aug 2023 14:35 UTC
4 points
0 comments1 min readLW link

The Case for Convexity

Jesse Richardson9 Aug 2023 14:09 UTC
19 points
3 comments1 min readLW link