AI #15: The Prin­ci­ple of Charity

ZviJun 8, 2023, 12:10 PM
73 points
16 comments44 min readLW link
(thezvi.wordpress.com)

A plea for solu­tion­ism on AI safety

jasoncrawfordJun 9, 2023, 4:29 PM
72 points
6 comments6 min readLW link
(rootsofprogress.org)

Me­taAI: less is less for al­ign­ment.

Cleo NardoJun 13, 2023, 2:08 PM
71 points
17 comments5 min readLW link

Man­i­fold Pre­dicted the AI Ex­tinc­tion State­ment and CAIS Wanted it Deleted

David CheeJun 12, 2023, 3:54 PM
71 points
15 comments12 min readLW link

LEAst-squares Con­cept Era­sure (LEACE)

tricky_labyrinthJun 7, 2023, 9:51 PM
68 points
10 comments1 min readLW link
(twitter.com)

Ad­ven­tist Health Study-2 sup­ports pesc­etar­i­anism more than veganism

ElizabethJun 17, 2023, 8:10 PM
67 points
11 comments6 min readLW link
(acesounderglass.com)

In­tro­duc­tion to Towards Causal Foun­da­tions of Safe AGI

Jun 12, 2023, 5:55 PM
67 points
6 comments4 min readLW link

“text­books are all you need”

bhauthJun 21, 2023, 5:06 PM
66 points
18 comments2 min readLW link
(arxiv.org)

Short timelines and slow, con­tin­u­ous take­off as the safest path to AGI

Jun 21, 2023, 8:56 AM
65 points
15 comments7 min readLW link

The ones who endure

Richard_NgoJun 16, 2023, 2:40 PM
65 points
16 comments5 min readLW link
(www.thinkingcomplete.com)

Man in the Arena

Richard_NgoJun 26, 2023, 9:57 PM
65 points
6 comments8 min readLW link

A Friendly Face (Another Failure Story)

Jun 20, 2023, 10:31 AM
65 points
21 comments16 min readLW link

Which per­son­al­ity traits are real? Stress-test­ing the lex­i­cal hypothesis

tailcalledJun 21, 2023, 7:46 PM
65 points
5 comments9 min readLW link1 review

TASRA: A Tax­on­omy and Anal­y­sis of So­cietal-Scale Risks from AI

Andrew_CritchJun 13, 2023, 5:04 AM
64 points
1 comment1 min readLW link

UK Foun­da­tion Model Task Force—Ex­pres­sion of Interest

ojorgensenJun 18, 2023, 9:43 AM
64 points
2 comments1 min readLW link
(twitter.com)

Uncer­tainty about the fu­ture does not im­ply that AGI will go well

Lauro LangoscoJun 1, 2023, 5:38 PM
62 points
11 comments7 min readLW link

AISafety.info “How can I help?” FAQ

Jun 5, 2023, 10:09 PM
59 points
0 comments2 min readLW link

A Dou­ble-Fea­ture on The Extropians

Maxwell TabarrokJun 3, 2023, 6:27 PM
59 points
4 comments1 min readLW link

Ages Sur­vey: Results

jefftkJun 5, 2023, 2:10 AM
57 points
10 comments5 min readLW link
(www.jefftk.com)

Contin­gency: A Con­cep­tual Tool from Evolu­tion­ary Biol­ogy for Alignment

clem_acsJun 12, 2023, 8:54 PM
57 points
2 comments14 min readLW link
(acsresearch.org)

[Re­quest]: Use “Epi­lo­gen­ics” in­stead of “Eu­gen­ics” in most circumstances

GeneSmithJun 1, 2023, 3:36 PM
56 points
49 comments1 min readLW link

A “weak” AGI may at­tempt an un­likely-to-suc­ceed takeover

RobertMJun 28, 2023, 8:31 PM
56 points
17 comments3 min readLW link

The Con­trol Prob­lem: Un­solved or Un­solv­able?

RemmeltJun 2, 2023, 3:42 PM
55 points
46 comments14 min readLW link

for­mal­iz­ing the QACI al­ign­ment for­mal-goal

Jun 10, 2023, 3:28 AM
54 points
6 comments13 min readLW link
(carado.moe)

Im­prove­ment on MIRI’s Corrigibility

Jun 9, 2023, 4:10 PM
54 points
8 comments13 min readLW link

DSLT 1. The RLCT Mea­sures the Effec­tive Di­men­sion of Neu­ral Networks

Liam CarrollJun 16, 2023, 9:50 AM
54 points
10 comments13 min readLW link

Mode col­lapse in RL may be fueled by the up­date equation

Jun 19, 2023, 9:51 PM
53 points
10 comments8 min readLW link

[Repli­ca­tion] Con­jec­ture’s Sparse Cod­ing in Small Transformers

Jun 16, 2023, 6:02 PM
52 points
0 comments5 min readLW link

An Ex­er­cise to Build In­tu­itions on AGI Risk

Lauro LangoscoJun 7, 2023, 6:35 PM
52 points
3 comments8 min readLW link

Are Bayesian meth­ods guaran­teed to overfit?

Ege ErdilJun 17, 2023, 12:52 PM
52 points
5 comments3 min readLW link
(www.yulingyao.com)

AXRP Epi­sode 22 - Shard The­ory with Quintin Pope

DanielFilanJun 15, 2023, 7:00 PM
52 points
11 comments93 min readLW link

In­ternLM—China’s Best (Un­ver­ified)

Lao MeinJun 9, 2023, 7:39 AM
51 points
4 comments1 min readLW link

A moral back­lash against AI will prob­a­bly slow down AGI development

geoffreymillerJun 7, 2023, 8:39 PM
51 points
10 comments14 min readLW link

How to Think About Ac­ti­va­tion Patching

Neel NandaJun 4, 2023, 2:17 PM
50 points
5 comments20 min readLW link
(www.neelnanda.io)

Crys­tal Heal­ing — or the Ori­gins of Ex­pected Utility Maximizers

Jun 25, 2023, 3:18 AM
50 points
11 comments5 min readLW link

The Case for Over­con­fi­dence is Overstated

Kevin DorstJun 28, 2023, 5:21 PM
50 points
13 comments8 min readLW link
(kevindorst.substack.com)

Causal­ity: A Brief Introduction

Jun 20, 2023, 3:01 PM
49 points
18 comments6 min readLW link

In­stru­men­tal Con­ver­gence? [Draft]

J. Dmitri GallowJun 14, 2023, 8:21 PM
48 points
20 comments33 min readLW link

Elon talked with se­nior Chi­nese lead­er­ship about AI X-risk

ChristianKlJun 7, 2023, 3:02 PM
47 points
2 comments1 min readLW link
(www.youtube.com)

“Safety Cul­ture for AI” is im­por­tant, but isn’t go­ing to be easy

DavidmanheimJun 26, 2023, 12:52 PM
47 points
2 comments2 min readLW link
(forum.effectivealtruism.org)

My im­pres­sion of sin­gu­lar learn­ing theory

Ege ErdilJun 18, 2023, 3:34 PM
47 points
30 comments2 min readLW link

AI #18: The Great De­bate Debate

ZviJun 29, 2023, 4:20 PM
47 points
9 comments52 min readLW link
(thezvi.wordpress.com)

Up­dat­ing Drexler’s CAIS model

Matthew BarnettJun 16, 2023, 10:53 PM
47 points
32 comments4 min readLW link

AI #16: AI in the UK

ZviJun 15, 2023, 1:20 PM
46 points
20 comments54 min readLW link
(thezvi.wordpress.com)

Agen­tic Mess (A Failure Story)

Jun 6, 2023, 1:09 PM
46 points
5 comments13 min readLW link

I can see how I am Dumb

Johannes C. MayerJun 10, 2023, 7:18 PM
46 points
11 comments5 min readLW link

Ban de­vel­op­ment of un­pre­dictable pow­er­ful mod­els?

TurnTroutJun 20, 2023, 1:43 AM
46 points
25 comments4 min readLW link

We Are Less Wrong than E. T. Jaynes on Loss Func­tions in Hu­man Society

Zack_M_DavisJun 5, 2023, 5:34 AM
46 points
14 comments2 min readLW link

Why am I Me?

dadadarrenJun 25, 2023, 12:07 PM
45 points
46 comments3 min readLW link

Self-Blinded Caf­feine RCT

niplavJun 27, 2023, 12:38 PM
45 points
9 comments8 min readLW link