How to Give Com­ing AGI’s the Best Chance of Figur­ing Out Ethics for Us

sweenesmMay 23, 2024, 7:44 PM
1 point
2 comments10 min readLW link

Men­tor­ship in AGI Safety (MAGIS) call for men­tors

May 23, 2024, 6:28 PM
31 points
3 comments2 min readLW link

Quick Thoughts on Scal­ing Monosemanticity

Joel BurgetMay 23, 2024, 4:22 PM
28 points
1 comment4 min readLW link
(transformer-circuits.pub)

The case for stop­ping AI safety research

catubcMay 23, 2024, 3:55 PM
53 points
38 comments1 min readLW link

[Question] SAE sparse fea­ture graph us­ing only resi­d­ual layers

Jaehyuk LimMay 23, 2024, 1:32 PM
0 points
3 comments1 min readLW link

[Question] Are most peo­ple deeply con­fused about “love”, or am I miss­ing a hu­man uni­ver­sal?

SpectrumDTMay 23, 2024, 1:22 PM
13 points
28 comments3 min readLW link

Ex­ec­u­tive Dys­func­tion 101

DaystarEldMay 23, 2024, 12:43 PM
28 points
1 comment3 min readLW link
(daystareld.com)

AI #65: I Spy With My AI

ZviMay 23, 2024, 12:40 PM
28 points
7 comments43 min readLW link
(thezvi.wordpress.com)

What mis­takes has the AI safety move­ment made?

EuanMcLeanMay 23, 2024, 11:19 AM
64 points
29 comments12 min readLW link

What should AI safety be try­ing to achieve?

EuanMcLeanMay 23, 2024, 11:17 AM
17 points
1 comment13 min readLW link

What will the first hu­man-level AI look like, and how might things go wrong?

EuanMcLeanMay 23, 2024, 11:17 AM
20 points
2 comments15 min readLW link

Big Pic­ture AI Safety: Introduction

EuanMcLeanMay 23, 2024, 11:15 AM
46 points
7 comments5 min readLW link

Paper in Science: Manag­ing ex­treme AI risks amid rapid progress

JanBMay 23, 2024, 8:40 AM
50 points
2 comments1 min readLW link

Power Law Policy

Ben TurtelMay 23, 2024, 5:28 AM
4 points
7 comments6 min readLW link
(bturtel.substack.com)

Why en­tropy means you might not have to worry as much about su­per­in­tel­li­gent AI

Ron JMay 23, 2024, 3:52 AM
−26 points
1 comment2 min readLW link

Quick Thoughts on Our First Sam­pling Run

jefftkMay 23, 2024, 12:20 AM
29 points
3 comments2 min readLW link
(www.jefftk.com)

AI Safety pro­posal—In­fluenc­ing the su­per­in­tel­li­gence explosion

MorganMay 22, 2024, 11:31 PM
0 points
2 comments7 min readLW link

Im­ple­ment­ing Asi­mov’s Laws of Robotics—How I imag­ine al­ign­ment work­ing.

Joshua ClancyMay 22, 2024, 11:15 PM
2 points
0 comments11 min readLW link

Higher-Order Forecasts

ozziegooenMay 22, 2024, 9:49 PM
45 points
1 commentLW link

A Pos­i­tive Dou­ble Stan­dard—Self-Help Prin­ci­ples Work For In­di­vi­d­u­als Not Populations

James Stephen BrownMay 22, 2024, 9:37 PM
8 points
3 comments5 min readLW link

A Bi-Mo­dal Brain Model

Johannes C. MayerMay 22, 2024, 8:10 PM
12 points
3 comments2 min readLW link

Offer­ing ser­vice as a sen­sayer for simu­la­tion­ist-ad­ja­cent be­liefs.

mako yassMay 22, 2024, 6:52 PM
22 points
0 comments1 min readLW link

Do Not Mess With Scar­lett Johansson

ZviMay 22, 2024, 3:10 PM
65 points
7 comments16 min readLW link
(thezvi.wordpress.com)

How Mul­ti­verse The­ory dis­solves Quan­tum in­ex­pli­ca­bil­ity

mrdlmMay 22, 2024, 2:55 PM
0 points
0 comments1 min readLW link

[Question] Should we be con­cerned about eat­ing too much soy?

ChristianKlMay 22, 2024, 12:53 PM
18 points
3 comments1 min readLW link

Pro­ce­du­ral Ex­ec­u­tive Func­tion, Part 3

DaystarEldMay 22, 2024, 11:58 AM
20 points
4 commentsLW link

Ci­cadas, An­thropic, and the bilat­eral al­ign­ment problem

kromemMay 22, 2024, 11:09 AM
28 points
6 comments5 min readLW link

An­nounc­ing Hu­man-al­igned AI Sum­mer School

May 22, 2024, 8:55 AM
50 points
0 comments1 min readLW link
(humanaligned.ai)

“Which chains-of-thought was that faster than?”

EmrikMay 22, 2024, 8:21 AM
37 points
4 comments4 min readLW link

Each Llama3-8b text uses a differ­ent “ran­dom” sub­space of the ac­ti­va­tion space

tailcalledMay 22, 2024, 7:31 AM
3 points
4 comments7 min readLW link

ARIA’s Safe­guarded AI grant pro­gram is ac­cept­ing ap­pli­ca­tions for Tech­ni­cal Area 1.1 un­til May 28th

Brendon_WongMay 22, 2024, 6:54 AM
11 points
0 comments1 min readLW link
(www.aria.org.uk)

An­thropic an­nounces in­ter­pretabil­ity ad­vances. How much does this ad­vance al­ign­ment?

Seth HerdMay 21, 2024, 10:30 PM
49 points
4 comments3 min readLW link
(www.anthropic.com)

[Question] What would stop you from pay­ing for an LLM?

yanni kyriacosMay 21, 2024, 10:25 PM
17 points
15 comments1 min readLW link

EIS XIII: Reflec­tions on An­thropic’s SAE Re­search Circa May 2024

scasperMay 21, 2024, 8:15 PM
157 points
16 comments3 min readLW link

Miti­gat­ing ex­treme AI risks amid rapid progress [Linkpost]

Orpheus16May 21, 2024, 7:59 PM
21 points
7 comments4 min readLW link

The prob­lem with rationality

David LoomisMay 21, 2024, 6:49 PM
−17 points
1 comment6 min readLW link

rough draft on what hap­pens in the brain when you have an insight

EmrikMay 21, 2024, 6:02 PM
11 points
2 comments1 min readLW link

On Dwarkesh’s Pod­cast with OpenAI’s John Schulman

ZviMay 21, 2024, 5:30 PM
73 points
4 comments20 min readLW link
(thezvi.wordpress.com)

[Question] Is delet­ing ca­pa­bil­ities still a rele­vant re­search ques­tion?

tailcalledMay 21, 2024, 1:24 PM
15 points
1 comment1 min readLW link

New vol­un­tary com­mit­ments (AI Seoul Sum­mit)

Zach Stein-PerlmanMay 21, 2024, 11:00 AM
81 points
17 comments7 min readLW link
(www.gov.uk)

ACX/​LW/​EA/​* Meetup Bremen

RasmusHBMay 21, 2024, 5:42 AM
2 points
0 comments1 min readLW link

My Dat­ing Heuristic

Declan MolonyMay 21, 2024, 5:28 AM
26 points
4 comments2 min readLW link

Scorable Func­tions: A For­mat for Al­gorith­mic Forecasting

ozziegooenMay 21, 2024, 4:14 AM
29 points
0 commentsLW link

The Prob­lem With the Word ‘Align­ment’

May 21, 2024, 3:48 AM
63 points
8 comments6 min readLW link

What’s Go­ing on With OpenAI’s Mes­sag­ing?

ozziegooenMay 21, 2024, 2:22 AM
191 points
13 commentsLW link

Har­mony In­tel­li­gence is Hiring!

May 21, 2024, 2:11 AM
10 points
0 comments1 min readLW link
(www.harmonyintelligence.com)

[Linkpost] State­ment from Scar­lett Jo­hans­son on OpenAI’s use of the “Sky” voice, that was shock­ingly similar to her own voice.

LinchMay 20, 2024, 11:50 PM
31 points
8 comments1 min readLW link
(variety.com)

Some per­spec­tives on the dis­ci­pline of Physics

TahpMay 20, 2024, 6:19 PM
17 points
3 comments13 min readLW link
(quark.rodeo)

[Question] Are there any groupchats for peo­ple work­ing on Rep­re­sen­ta­tion read­ing/​con­trol, ac­ti­va­tion steer­ing type ex­per­i­ments?

Joe KwonMay 20, 2024, 6:03 PM
3 points
1 comment1 min readLW link

In­ter­pretabil­ity: In­te­grated Gra­di­ents is a de­cent at­tri­bu­tion method

May 20, 2024, 5:55 PM
23 points
7 comments6 min readLW link