Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik Samoylov30 Oct 2023 23:42 UTC
−19 points
11 comments2 min readLW link

Will re­leas­ing the weights of large lan­guage mod­els grant wide­spread ac­cess to pan­demic agents?

jefftk30 Oct 2023 18:22 UTC
46 points
25 comments1 min readLW link
(arxiv.org)

[Linkpost] Two ma­jor an­nounce­ments in AI gov­er­nance today

Angélina30 Oct 2023 17:28 UTC
1 point
1 comment1 min readLW link
(www.whitehouse.gov)

Grokking Beyond Neu­ral Networks

Jack Miller30 Oct 2023 17:28 UTC
9 points
0 comments2 min readLW link
(arxiv.org)

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew Wearden30 Oct 2023 17:27 UTC
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

Jailbreak and Guard Aligned Lan­guage Models with Only Few In-Con­text Demonstrations

Zeming Wei30 Oct 2023 17:22 UTC
3 points
1 comment1 min readLW link

5 Rea­sons Why Govern­ments/​Mili­taries Already Want AI for In­for­ma­tion Warfare

trevor30 Oct 2023 16:30 UTC
32 points
0 comments10 min readLW link

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

beren30 Oct 2023 15:20 UTC
3 points
0 comments1 min readLW link

AI Align­ment [progress] this Week (10/​29/​2023)

Logan Zoellner30 Oct 2023 15:02 UTC
15 points
4 comments6 min readLW link
(midwitalignment.substack.com)

Im­prov­ing the Welfare of AIs: A Nearcasted Proposal

ryan_greenblatt30 Oct 2023 14:51 UTC
87 points
5 comments20 min readLW link

Pres­i­dent Bi­den Is­sues Ex­ec­u­tive Order on Safe, Se­cure, and Trust­wor­thy Ar­tifi­cial Intelligence

Tristan Williams30 Oct 2023 11:15 UTC
170 points
39 comments1 min readLW link
(www.whitehouse.gov)

GPT-2 XL’s ca­pac­ity for co­her­ence and on­tol­ogy clustering

MiguelDev30 Oct 2023 9:24 UTC
6 points
2 comments41 min readLW link

Char­bel-Raphaël and Lu­cius dis­cuss Interpretability

30 Oct 2023 5:50 UTC
104 points
7 comments21 min readLW link

Multi-Win­ner 3-2-1 Voting

Yoav Ravid30 Oct 2023 3:31 UTC
12 points
5 comments3 min readLW link

math ter­minol­ogy as convolution

bhauth30 Oct 2023 1:05 UTC
34 points
1 comment4 min readLW link
(www.bhauth.com)

Grokking, mem­o­riza­tion, and gen­er­al­iza­tion — a discussion

29 Oct 2023 23:17 UTC
63 points
10 comments23 min readLW link

Comp Sci in 2027 (Short story by Eliezer Yud­kowsky)

sudo29 Oct 2023 23:09 UTC
141 points
22 comments10 min readLW link
(nitter.net)

Math­e­mat­i­cally-Defined Op­ti­miza­tion Cap­tures A Lot of Use­ful Information

J Bostock29 Oct 2023 17:17 UTC
19 points
0 comments2 min readLW link

Clar­ify­ing the free en­ergy prin­ci­ple (with quotes)

Ryo 29 Oct 2023 16:03 UTC
8 points
0 comments9 min readLW link

A new in­tro to Quan­tum Physics, with the math fixed

titotal29 Oct 2023 15:11 UTC
112 points
22 comments17 min readLW link
(titotal.substack.com)

My idea of sa­cred­ness, di­v­inity, and religion

Kaj_Sotala29 Oct 2023 12:50 UTC
39 points
9 comments4 min readLW link
(kajsotala.fi)

The AI Boom Mainly Benefits Big Firms, but long-term, mar­kets will concentrate

Hauke Hillebrandt29 Oct 2023 8:38 UTC
−1 points
0 comments1 min readLW link

What’s up with “Re­spon­si­ble Scal­ing Poli­cies”?

29 Oct 2023 4:17 UTC
99 points
8 comments20 min readLW link

Ex­per­i­ments as a Third Alternative

Adam Zerner29 Oct 2023 0:39 UTC
48 points
19 comments5 min readLW link

Com­par­ing rep­re­sen­ta­tion vec­tors be­tween llama 2 base and chat

Nina Rimsky28 Oct 2023 22:54 UTC
36 points
5 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

Vaniver28 Oct 2023 21:06 UTC
46 points
4 comments3 min readLW link

Book Re­view: Oral­ity and Liter­acy: The Tech­nol­o­giz­ing of the Word

Fergus Fettes28 Oct 2023 20:12 UTC
13 points
0 comments16 min readLW link

Re­grant up to $600,000 to AI safety pro­jects with GiveWiki

Dawn Drescher28 Oct 2023 19:56 UTC
33 points
1 comment1 min readLW link

Shane Legg in­ter­view on alignment

Seth Herd28 Oct 2023 19:28 UTC
66 points
20 comments2 min readLW link
(www.youtube.com)

AI Ex­is­ten­tial Safety Fellowships

mmfli28 Oct 2023 18:07 UTC
5 points
0 comments1 min readLW link

AI Safety Hub Ser­bia Offi­cial Opening

28 Oct 2023 17:03 UTC
51 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Manag­ing AI Risks in an Era of Rapid Progress

Algon28 Oct 2023 15:48 UTC
30 points
3 comments11 min readLW link
(managing-ai-risks.com)

[Question] ELI5 Why isn’t al­ign­ment *eas­ier* as mod­els get stronger?

Logan Zoellner28 Oct 2023 14:34 UTC
3 points
9 comments1 min readLW link

Truth­seek­ing, EA, Si­mu­lacra lev­els, and other stuff

27 Oct 2023 23:56 UTC
44 points
12 comments9 min readLW link

[Question] Do you be­lieve “E=mc^2” is a cor­rect and/​or use­ful equa­tion, and, whether yes or no, pre­cisely what are your rea­sons for hold­ing this be­lief (with such a de­gree of con­fi­dence)?

l8c27 Oct 2023 22:46 UTC
10 points
14 comments1 min readLW link

Value sys­tem­ati­za­tion: how val­ues be­come co­her­ent (and mis­al­igned)

Richard_Ngo27 Oct 2023 19:06 UTC
95 points
47 comments13 min readLW link

Techno-hu­man­ism is techno-op­ti­mism for the 21st century

Richard_Ngo27 Oct 2023 18:37 UTC
88 points
5 comments14 min readLW link
(www.mindthefuture.info)

Sanc­tu­ary for Humans

nikola27 Oct 2023 18:08 UTC
21 points
9 comments1 min readLW link

Wire­head­ing and mis­al­ign­ment by com­po­si­tion on NetHack

pierlucadoro27 Oct 2023 17:43 UTC
34 points
4 comments4 min readLW link

We’re Not Ready: thoughts on “paus­ing” and re­spon­si­ble scal­ing policies

HoldenKarnofsky27 Oct 2023 15:19 UTC
199 points
33 comments8 min readLW link

Aspira­tion-based Q-Learning

27 Oct 2023 14:42 UTC
38 points
5 comments11 min readLW link

Linkpost: Rishi Su­nak’s Speech on AI (26th Oc­to­ber)

bideup27 Oct 2023 11:57 UTC
85 points
8 comments7 min readLW link
(www.gov.uk)

ASPR & WARP: Ra­tion­al­ity Camps for Teens in Taiwan and Oxford

Anna Gajdova27 Oct 2023 8:40 UTC
18 points
0 comments1 min readLW link

[Question] To what ex­tent is the UK Govern­ment’s re­cent AI Safety push en­tirely due to Rishi Su­nak?

Stephen Fowler27 Oct 2023 3:29 UTC
23 points
4 comments1 min readLW link

Bayesian Punishment

Rob Lucas27 Oct 2023 3:24 UTC
1 point
1 comment6 min readLW link

On­line Dialogues Party — Sun­day 5th November

Ben Pace27 Oct 2023 2:41 UTC
28 points
1 comment1 min readLW link

OpenAI’s new Pre­pared­ness team is hiring

leopold26 Oct 2023 20:42 UTC
60 points
2 comments1 min readLW link

Fake Deeply

Zack_M_Davis26 Oct 2023 19:55 UTC
33 points
7 comments1 min readLW link
(unremediatedgender.space)

Sym­bol/​Refer­ent Con­fu­sions in Lan­guage Model Align­ment Experiments

johnswentworth26 Oct 2023 19:49 UTC
93 points
44 comments6 min readLW link

Un­su­per­vised Meth­ods for Con­cept Dis­cov­ery in AlphaZero

aogara26 Oct 2023 19:05 UTC
9 points
0 comments1 min readLW link
(arxiv.org)