D&D.Sci Tax Day: Ad­ven­tur­ers and Assessments

aphyer15 Apr 2025 23:43 UTC
47 points
14 comments2 min readLW link

Should AIs be En­couraged to Co­op­er­ate?

PeterMcCluskey15 Apr 2025 21:57 UTC
13 points
2 comments5 min readLW link
(bayesianinvestor.com)

OpenAI rewrote its Pre­pared­ness Framework

Zach Stein-Perlman15 Apr 2025 20:00 UTC
36 points
1 comment6 min readLW link

ASI ex­is­ten­tial risk: Re­con­sid­er­ing Align­ment as a Goal

habryka15 Apr 2025 19:57 UTC
93 points
14 comments19 min readLW link
(michaelnotebook.com)

Nu­cleic Acid Ob­ser­va­tory Up­dates, April 2025

jefftk15 Apr 2025 18:58 UTC
27 points
0 comments4 min readLW link
(naobservatory.org)

Some Othel­loGPT Circuits

Alfred Wong15 Apr 2025 18:41 UTC
7 points
0 comments7 min readLW link

The Mir­ror Prob­lem in AI: Why Lan­guage Models Say What­ever You Want

RobT15 Apr 2025 18:40 UTC
9 points
2 comments3 min readLW link

What hap­pens when LLMs learn new things? & Con­tinual learn­ing for­ever.

sunchipsster15 Apr 2025 18:38 UTC
4 points
1 comment7 min readLW link

To be leg­ible, ev­i­dence of mis­al­ign­ment prob­a­bly has to be behavioral

ryan_greenblatt15 Apr 2025 18:14 UTC
57 points
19 comments3 min readLW link

AISN #51: AI Frontiers

15 Apr 2025 16:01 UTC
8 points
1 comment5 min readLW link
(newsletter.safe.ai)

Sur­pris­ing LLM rea­son­ing failures make me think we still need qual­i­ta­tive break­throughs for AGI

Kaj_Sotala15 Apr 2025 15:56 UTC
174 points
52 comments18 min readLW link

OpenAI #13: Alt­man at TED and OpenAI Cut­ting Corners on Safety Testing

Zvi15 Apr 2025 15:30 UTC
48 points
3 comments12 min readLW link
(thezvi.wordpress.com)

The real rea­son AI bench­marks haven’t re­flected eco­nomic impacts

Noosphere8915 Apr 2025 13:44 UTC
15 points
0 comments1 min readLW link
(epoch.ai)

Map of AI Safety v2

15 Apr 2025 13:04 UTC
64 points
4 comments1 min readLW link

3M Sub­scriber YouTube Ac­count ‘Chan­nel 5’ Re­port­ing On Rationalism

sakraf15 Apr 2025 13:02 UTC
4 points
0 comments1 min readLW link
(youtu.be)

Can SAE steer­ing re­veal sand­bag­ging?

15 Apr 2025 12:33 UTC
35 points
3 comments4 min readLW link

Risers for Foot Percussion

jefftk15 Apr 2025 11:10 UTC
9 points
2 comments1 min readLW link
(www.jefftk.com)

What em­piri­cal re­search di­rec­tions has Eliezer com­mented pos­i­tively on?

Chris_Leong15 Apr 2025 8:53 UTC
8 points
1 comment1 min readLW link

Why Does It Feel Like Some­thing? An Evolu­tion­ary Path to Subjectivity

gmax15 Apr 2025 8:38 UTC
1 point
18 comments10 min readLW link

How to Defend the Indefensible

Alex Beyman15 Apr 2025 7:45 UTC
5 points
1 comment21 min readLW link

A Tal­mu­dic Ra­tion­al­ist Cau­tion­ary Tale

Noah Birnbaum15 Apr 2025 4:11 UTC
13 points
2 comments2 min readLW link

Creat­ing ‘Mak­ing God’: a Fea­ture Doc­u­men­tary on risks from AGI

Connor Axiotes15 Apr 2025 2:56 UTC
4 points
0 comments7 min readLW link

A Dissent on Honesty

eva_15 Apr 2025 2:43 UTC
44 points
52 comments14 min readLW link

$500 bounty for best short-form fic­tion about our near fu­ture world; $100 for recom­mend­ing win­ning piece: new “Art of Near Fu­ture World” quar­terly art project

Ramon Gonzalez15 Apr 2025 0:46 UTC
6 points
1 comment2 min readLW link

What if there was a nuke in Man­hat­tan and why that could be a good thing

Ratburn15 Apr 2025 0:19 UTC
3 points
11 comments3 min readLW link

Nihilism Is Not Enough By Peter Thiel

shawkisukkar15 Apr 2025 0:13 UTC
6 points
4 comments1 min readLW link
(www.nihilismisnotenough.com)

Cor­rect­ing De­cep­tive Align­ment us­ing a Deon­tolog­i­cal Approach

JeaniceK14 Apr 2025 22:07 UTC
8 points
0 comments7 min readLW link

Reli­gious Per­sis­tence: A Miss­ing Prim­i­tive for Ro­bust Alignment

lauriewired14 Apr 2025 22:03 UTC
6 points
3 comments8 min readLW link

The 4-Minute Mile Effect

Parker Conley14 Apr 2025 21:41 UTC
32 points
6 comments2 min readLW link
(parconley.com)

Light­ning Talks!

nathandunkerley14 Apr 2025 20:39 UTC
1 point
0 comments1 min readLW link

The Bell Curve of Bad Behavior

Screwtape14 Apr 2025 19:58 UTC
54 points
6 comments10 min readLW link

Sen­tinel’s Global Risks Weekly Roundup #15/​2025: Tar­iff yoyo, OpenAI slash­ing safety test­ing, Iran nu­clear pro­gramme ne­go­ti­a­tions, 1K H5N1 con­firmed herd in­fec­tions.

NunoSempere14 Apr 2025 19:11 UTC
42 points
0 comments2 min readLW link
(blog.sentinel-team.org)

Sam Alt­man’s sister claims Sam sex­u­ally abused her—Part 7: Timeline, continued

pythagoras501514 Apr 2025 17:43 UTC
2 points
0 comments36 min readLW link

Sam Alt­man’s sister claims Sam sex­u­ally abused her—Part 8: Timeline, continued

pythagoras501514 Apr 2025 17:42 UTC
4 points
0 comments71 min readLW link

Fron­tier AI Models Still Fail at Ba­sic Phys­i­cal Tasks: A Man­u­fac­tur­ing Case Study

Adam Karvonen14 Apr 2025 17:38 UTC
158 points
42 comments7 min readLW link
(adamkarvonen.github.io)

How to eval­u­ate con­trol mea­sures for LLM agents? A tra­jec­tory from to­day to superintelligence

14 Apr 2025 16:45 UTC
29 points
1 comment2 min readLW link

Ap­pli­ca­tions Open for Im­pact Ac­cel­er­a­tor Pro­gram for Ex­pe­rienced Professionals

Clark Wisenbaker14 Apr 2025 16:27 UTC
1 point
0 comments3 min readLW link

The Last Light

Bridgett Kay14 Apr 2025 15:41 UTC
31 points
2 comments4 min readLW link

Offer: Team Con­flict Coun­sel­ing for AI Safety Orgs

Severin T. Seehrich14 Apr 2025 15:17 UTC
19 points
1 comment1 min readLW link

Slop­world 2035: The dan­gers of mediocre AI

titotal14 Apr 2025 13:14 UTC
22 points
6 comments29 min readLW link
(titotal.substack.com)

Try train­ing to­ken-level probes

StefanHex14 Apr 2025 11:56 UTC
47 points
6 comments8 min readLW link

Monthly Roundup #29: April 2025

Zvi14 Apr 2025 11:50 UTC
23 points
7 comments24 min readLW link
(thezvi.wordpress.com)

A Solu­tion to Sand­bag­ging and other Self-Prov­able Misal­ign­ment: Con­sti­tu­tional AI Detectives

Knight Lee14 Apr 2025 10:27 UTC
−3 points
2 comments4 min readLW link

One-shot steer­ing vec­tors cause emer­gent mis­al­ign­ment, too

Jacob Dunefsky14 Apr 2025 6:40 UTC
98 points
6 comments11 min readLW link

Un­bend­able Arm as Test Case for Reli­gious Belief

Ivan Vendrov14 Apr 2025 1:57 UTC
28 points
39 comments2 min readLW link
(nothinghuman.substack.com)

Sam Alt­man’s sister claims Sam sex­u­ally abused her—Part 5: Timeline, continued

pythagoras501514 Apr 2025 1:00 UTC
1 point
0 comments125 min readLW link

Луна Лавгуд и Комната Тайн, Часть 5

14 Apr 2025 0:10 UTC
4 points
0 comments3 min readLW link

Sam Alt­man’s sister claims Sam sex­u­ally abused her—Part 4: Timeline, continued

pythagoras501513 Apr 2025 23:41 UTC
1 point
0 comments51 min readLW link

The Struc­ture of the Pain of Change

ReverendBayes13 Apr 2025 21:51 UTC
7 points
0 comments10 min readLW link

Луна Лавгуд и Комната Тайн, Часть 4

13 Apr 2025 20:55 UTC
3 points
0 comments4 min readLW link