Thoughts on the AI Safety Sum­mit com­pany policy re­quests and responses

So8resOct 31, 2023, 11:54 PM
169 points
14 comments10 min readLW link

AISN #25: White House Ex­ec­u­tive Order on AI, UK AI Safety Sum­mit, and Progress on Vol­un­tary Eval­u­a­tions of AI Risks

Dan HOct 31, 2023, 7:34 PM
35 points
1 comment6 min readLW link
(newsletter.safe.ai)

If AIs be­come self-aware, what re­li­gion will they have?

mnvrOct 31, 2023, 5:29 PM
−17 points
3 comments4 min readLW link

Self-Blinded L-Thea­nine RCT

niplavOct 31, 2023, 3:24 PM
53 points
12 comments3 min readLW link

AI Safety 101 - Chap­ter 5.2 - Un­re­stricted Ad­ver­sar­ial Training

Charbel-RaphaëlOct 31, 2023, 2:34 PM
17 points
0 comments19 min readLW link

Prevent­ing Lan­guage Models from hid­ing their reasoning

Oct 31, 2023, 2:34 PM
119 points
15 comments12 min readLW link1 review

AI Safety 101 - Chap­ter 5.1 - Debate

Charbel-RaphaëlOct 31, 2023, 2:29 PM
15 points
0 comments13 min readLW link

M&A in AI

Hauke HillebrandtOct 31, 2023, 12:20 PM
2 points
0 commentsLW link

Urg­ing an In­ter­na­tional AI Treaty: An Open Letter

Olli JärviniemiOct 31, 2023, 11:26 AM
48 points
2 comments1 min readLW link
(aitreaty.org)

[Closed] Agent Foun­da­tions track in MATS

Vanessa KosoyOct 31, 2023, 8:12 AM
54 points
1 comment1 min readLW link
(www.matsprogram.org)

In­trin­sic Drives and Ex­trin­sic Mi­suse: Two In­ter­twined Risks of AI

jsteinhardtOct 31, 2023, 5:10 AM
40 points
0 comments12 min readLW link
(bounded-regret.ghost.io)

Fo­cus on ex­is­ten­tial risk is a dis­trac­tion from the real is­sues. A false fallacy

Nik SamoylovOct 30, 2023, 11:42 PM
−19 points
11 comments2 min readLW link

Will re­leas­ing the weights of large lan­guage mod­els grant wide­spread ac­cess to pan­demic agents?

jefftkOct 30, 2023, 6:22 PM
47 points
25 commentsLW link
(arxiv.org)

[Linkpost] Two ma­jor an­nounce­ments in AI gov­er­nance today

AngélinaOct 30, 2023, 5:28 PM
1 point
1 comment1 min readLW link
(www.whitehouse.gov)

Grokking Beyond Neu­ral Networks

Jack MillerOct 30, 2023, 5:28 PM
10 points
0 comments2 min readLW link
(arxiv.org)

Re­sponse to “Co­or­di­nated paus­ing: An eval­u­a­tion-based co­or­di­na­tion scheme for fron­tier AI de­vel­op­ers”

Matthew WeardenOct 30, 2023, 5:27 PM
5 points
2 comments6 min readLW link
(matthewwearden.co.uk)

Jailbreak and Guard Aligned Lan­guage Models with Only Few In-Con­text Demonstrations

Zeming WeiOct 30, 2023, 5:22 PM
3 points
1 comment1 min readLW link

5 Rea­sons Why Govern­ments/​Mili­taries Already Want AI for In­for­ma­tion Warfare

trevorOct 30, 2023, 4:30 PM
32 points
0 comments10 min readLW link

[Linkpost] Bi­den-Har­ris Ex­ec­u­tive Order on AI

berenOct 30, 2023, 3:20 PM
3 points
0 comments1 min readLW link

AI Align­ment [progress] this Week (10/​29/​2023)

Logan ZoellnerOct 30, 2023, 3:02 PM
15 points
4 comments6 min readLW link
(midwitalignment.substack.com)

Im­prov­ing the Welfare of AIs: A Nearcasted Proposal

ryan_greenblattOct 30, 2023, 2:51 PM
114 points
9 comments20 min readLW link1 review

Pres­i­dent Bi­den Is­sues Ex­ec­u­tive Order on Safe, Se­cure, and Trust­wor­thy Ar­tifi­cial Intelligence

Tristan WilliamsOct 30, 2023, 11:15 AM
171 points
39 commentsLW link
(www.whitehouse.gov)

GPT-2 XL’s ca­pac­ity for co­her­ence and on­tol­ogy clustering

MiguelDevOct 30, 2023, 9:24 AM
6 points
2 comments41 min readLW link

Char­bel-Raphaël and Lu­cius dis­cuss interpretability

Oct 30, 2023, 5:50 AM
111 points
7 comments21 min readLW link

Multi-Win­ner 3-2-1 Voting

Yoav RavidOct 30, 2023, 3:31 AM
14 points
6 comments3 min readLW link

math ter­minol­ogy as convolution

bhauthOct 30, 2023, 1:05 AM
34 points
1 comment4 min readLW link
(www.bhauth.com)

Grokking, mem­o­riza­tion, and gen­er­al­iza­tion — a discussion

Oct 29, 2023, 11:17 PM
75 points
11 comments23 min readLW link

Comp Sci in 2027 (Short story by Eliezer Yud­kowsky)

sudoOct 29, 2023, 11:09 PM
201 points
24 comments10 min readLW link1 review
(nitter.net)

Math­e­mat­i­cally-Defined Op­ti­miza­tion Cap­tures A Lot of Use­ful Information

J BostockOct 29, 2023, 5:17 PM
19 points
0 comments2 min readLW link

Clar­ify­ing the free en­ergy prin­ci­ple (with quotes)

Ryo Oct 29, 2023, 4:03 PM
8 points
0 comments9 min readLW link

A new in­tro to Quan­tum Physics, with the math fixed

titotalOct 29, 2023, 3:11 PM
113 points
24 comments17 min readLW link
(titotal.substack.com)

My idea of sa­cred­ness, di­v­inity, and religion

Kaj_SotalaOct 29, 2023, 12:50 PM
40 points
10 comments4 min readLW link
(kajsotala.fi)

The AI Boom Mainly Benefits Big Firms, but long-term, mar­kets will concentrate

Hauke HillebrandtOct 29, 2023, 8:38 AM
−1 points
0 commentsLW link

What’s up with “Re­spon­si­ble Scal­ing Poli­cies”?

Oct 29, 2023, 4:17 AM
99 points
9 comments20 min readLW link1 review

Ex­per­i­ments as a Third Alternative

Adam ZernerOct 29, 2023, 12:39 AM
48 points
21 comments5 min readLW link

Com­par­ing rep­re­sen­ta­tion vec­tors be­tween llama 2 base and chat

Nina PanicksseryOct 28, 2023, 10:54 PM
36 points
5 comments2 min readLW link

Vaniver’s thoughts on An­thropic’s RSP

VaniverOct 28, 2023, 9:06 PM
46 points
4 comments3 min readLW link

Book Re­view: Oral­ity and Liter­acy: The Tech­nol­o­giz­ing of the Word

Fergus FettesOct 28, 2023, 8:12 PM
13 points
0 comments16 min readLW link

Re­grant up to $600,000 to AI safety pro­jects with GiveWiki

Dawn DrescherOct 28, 2023, 7:56 PM
33 points
1 commentLW link

Shane Legg in­ter­view on alignment

Seth HerdOct 28, 2023, 7:28 PM
66 points
20 comments2 min readLW link
(www.youtube.com)

AI Ex­is­ten­tial Safety Fellowships

mmfliOct 28, 2023, 6:07 PM
5 points
0 comments1 min readLW link

AI Safety Hub Ser­bia Offi­cial Opening

Oct 28, 2023, 5:03 PM
55 points
0 comments3 min readLW link
(forum.effectivealtruism.org)

Manag­ing AI Risks in an Era of Rapid Progress

AlgonOct 28, 2023, 3:48 PM
36 points
5 comments11 min readLW link
(managing-ai-risks.com)

[Question] ELI5 Why isn’t al­ign­ment *eas­ier* as mod­els get stronger?

Logan ZoellnerOct 28, 2023, 2:34 PM
3 points
9 comments1 min readLW link

Truth­seek­ing, EA, Si­mu­lacra lev­els, and other stuff

Oct 27, 2023, 11:56 PM
45 points
12 comments9 min readLW link

[Question] Do you be­lieve “E=mc^2” is a cor­rect and/​or use­ful equa­tion, and, whether yes or no, pre­cisely what are your rea­sons for hold­ing this be­lief (with such a de­gree of con­fi­dence)?

l8cOct 27, 2023, 10:46 PM
10 points
14 comments1 min readLW link

Value sys­tem­ati­za­tion: how val­ues be­come co­her­ent (and mis­al­igned)

Richard_NgoOct 27, 2023, 7:06 PM
103 points
49 comments13 min readLW link

Techno-hu­man­ism is techno-op­ti­mism for the 21st century

Richard_NgoOct 27, 2023, 6:37 PM
88 points
5 comments14 min readLW link
(www.mindthefuture.info)

Sanc­tu­ary for Humans

Nikola JurkovicOct 27, 2023, 6:08 PM
22 points
9 comments1 min readLW link

Wire­head­ing and mis­al­ign­ment by com­po­si­tion on NetHack

pierlucadoroOct 27, 2023, 5:43 PM
34 points
4 comments4 min readLW link