AI Safety pro­posal—In­fluenc­ing the su­per­in­tel­li­gence explosion

Morgan22 May 2024 23:31 UTC
0 points
2 comments7 min readLW link

Im­ple­ment­ing Asi­mov’s Laws of Robotics—How I imag­ine al­ign­ment work­ing.

Joshua Clancy22 May 2024 23:15 UTC
2 points
0 comments11 min readLW link

Higher-Order Forecasts

ozziegooen22 May 2024 21:49 UTC
45 points
1 comment3 min readLW link

A Pos­i­tive Dou­ble Stan­dard—Self-Help Prin­ci­ples Work For In­di­vi­d­u­als Not Populations

James Stephen Brown22 May 2024 21:37 UTC
8 points
3 comments5 min readLW link

A Bi-Mo­dal Brain Model

Johannes C. Mayer22 May 2024 20:10 UTC
12 points
3 comments2 min readLW link

Offer­ing ser­vice as a sen­sayer for simu­la­tion­ist-ad­ja­cent be­liefs.

mako yass22 May 2024 18:52 UTC
22 points
0 comments1 min readLW link

Do Not Mess With Scar­lett Johansson

Zvi22 May 2024 15:10 UTC
65 points
7 comments16 min readLW link
(thezvi.wordpress.com)

How Mul­ti­verse The­ory dis­solves Quan­tum in­ex­pli­ca­bil­ity

mrdlm22 May 2024 14:55 UTC
0 points
0 comments1 min readLW link

[Question] Should we be con­cerned about eat­ing too much soy?

ChristianKl22 May 2024 12:53 UTC
18 points
3 comments1 min readLW link

Pro­ce­du­ral Ex­ec­u­tive Func­tion, Part 3

DaystarEld22 May 2024 11:58 UTC
21 points
4 comments23 min readLW link

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromem22 May 2024 11:09 UTC
28 points
6 comments5 min readLW link

An­nounc­ing Hu­man-al­igned AI Sum­mer School

22 May 2024 8:55 UTC
51 points
0 comments1 min readLW link
(humanaligned.ai)

“Which chains-of-thought was that faster than?”

Emrik22 May 2024 8:21 UTC
37 points
4 comments4 min readLW link

Each Llama3-8b text uses a differ­ent “ran­dom” sub­space of the ac­ti­va­tion space

tailcalled22 May 2024 7:31 UTC
3 points
4 comments7 min readLW link

ARIA’s Safe­guarded AI grant pro­gram is ac­cept­ing ap­pli­ca­tions for Tech­ni­cal Area 1.1 un­til May 28th

Brendon_Wong22 May 2024 6:54 UTC
11 points
0 comments1 min readLW link
(www.aria.org.uk)

An­thropic an­nounces in­ter­pretabil­ity ad­vances. How much does this ad­vance al­ign­ment?

Seth Herd21 May 2024 22:30 UTC
49 points
4 comments3 min readLW link
(www.anthropic.com)

[Question] What would stop you from pay­ing for an LLM?

yanni kyriacos21 May 2024 22:25 UTC
17 points
15 comments1 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasper21 May 2024 20:15 UTC
157 points
16 comments3 min readLW link

Miti­gat­ing ex­treme AI risks amid rapid progress [Linkpost]

Orpheus1621 May 2024 19:59 UTC
21 points
7 comments4 min readLW link

The prob­lem with rationality

David Loomis21 May 2024 18:49 UTC
−17 points
1 comment6 min readLW link

rough draft on what hap­pens in the brain when you have an insight

Emrik21 May 2024 18:02 UTC
11 points
2 comments1 min readLW link

On Dwarkesh’s Pod­cast with OpenAI’s John Schulman

Zvi21 May 2024 17:30 UTC
73 points
4 comments20 min readLW link
(thezvi.wordpress.com)

[Question] Is delet­ing ca­pa­bil­ities still a rele­vant re­search ques­tion?

tailcalled21 May 2024 13:24 UTC
15 points
1 comment1 min readLW link

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-Perlman21 May 2024 11:00 UTC
81 points
17 comments7 min readLW link
(www.gov.uk)

ACX/​LW/​EA/​* Meetup Bremen

RasmusHB21 May 2024 5:42 UTC
2 points
0 comments1 min readLW link

My Dat­ing Heuristic

Declan Molony21 May 2024 5:28 UTC
27 points
4 comments2 min readLW link

Scorable Func­tions: A For­mat for Al­gorith­mic Forecasting

ozziegooen21 May 2024 4:14 UTC
29 points
0 comments8 min readLW link

The Prob­lem With the Word ‘Align­ment’

21 May 2024 3:48 UTC
63 points
8 comments6 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooen21 May 2024 2:22 UTC
191 points
13 comments3 min readLW link

Har­mony In­tel­li­gence is Hiring!

21 May 2024 2:11 UTC
10 points
0 comments1 min readLW link
(www.harmonyintelligence.com)

[Linkpost] State­ment from Scar­lett Jo­hans­son on OpenAI’s use of the “Sky” voice, that was shock­ingly similar to her own voice.

Linch20 May 2024 23:50 UTC
31 points
8 comments1 min readLW link
(variety.com)

Some per­spec­tives on the dis­ci­pline of Physics

Tahp20 May 2024 18:19 UTC
18 points
3 comments13 min readLW link
(quark.rodeo)

[Question] Are there any groupchats for peo­ple work­ing on Rep­re­sen­ta­tion read­ing/​con­trol, ac­ti­va­tion steer­ing type ex­per­i­ments?

Joe Kwon20 May 2024 18:03 UTC
3 points
1 comment1 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

20 May 2024 17:55 UTC
23 points
7 comments6 min readLW link

The Lo­cal In­ter­ac­tion Ba­sis: Iden­ti­fy­ing Com­pu­ta­tion­ally-Rele­vant and Sparsely In­ter­act­ing Fea­tures in Neu­ral Networks

20 May 2024 17:53 UTC
108 points
4 comments3 min readLW link

NAO Up­dates, Spring 2024

jefftk20 May 2024 16:51 UTC
13 points
0 comments6 min readLW link
(naobservatory.org)

OpenAI: Exodus

Zvi20 May 2024 13:10 UTC
153 points
26 comments44 min readLW link
(thezvi.wordpress.com)

In­fra-Bayesian haggling

hannagabor20 May 2024 12:23 UTC
28 points
0 comments20 min readLW link

Jaan Tal­linn’s 2023 Philan­thropy Overview

jaan20 May 2024 12:11 UTC
203 points
5 comments1 min readLW link
(jaan.info)

D&D.Sci (Easy Mode): On The Con­struc­tion Of Im­pos­si­ble Struc­tures [Eval­u­a­tion and Rule­set]

abstractapplic20 May 2024 9:38 UTC
31 points
2 comments1 min readLW link

Why I find Davi­dad’s plan interesting

Paul W20 May 2024 8:13 UTC
18 points
0 comments6 min readLW link

An­thropic: Reflec­tions on our Re­spon­si­ble Scal­ing Policy

Zac Hatfield-Dodds20 May 2024 4:14 UTC
30 points
21 comments10 min readLW link
(www.anthropic.com)

The con­sis­tent guess­ing prob­lem is eas­ier than the halt­ing problem

jessicata20 May 2024 4:02 UTC
38 points
5 comments4 min readLW link
(unstableontology.com)

A poem ti­tled ‘Tick Tock’.

Krantz20 May 2024 3:52 UTC
−1 points
0 comments1 min readLW link

Against Com­put­ers (in­finite play)

rogersbacon20 May 2024 0:43 UTC
−11 points
1 comment14 min readLW link
(www.secretorum.life)

Test­ing for par­allel rea­son­ing in LLMs

19 May 2024 15:28 UTC
9 points
7 comments9 min readLW link

Hot take: The AI safety move­ment is way too sec­tar­ian and this is greatly in­creas­ing p(doom)

O O19 May 2024 2:18 UTC
14 points
15 comments2 min readLW link

Some “meta-cruxes” for AI x-risk debates

Aryeh Englander19 May 2024 0:21 UTC
20 points
2 comments3 min readLW link

On Privilege

Shmi18 May 2024 22:36 UTC
16 points
10 comments2 min readLW link

Fund me please—I Work so Hard that my Feet start Bleed­ing and I Need to In­fil­trate University

Johannes C. Mayer18 May 2024 19:53 UTC
22 points
37 comments6 min readLW link