ALLFED emer­gency ap­peal: Help us raise $800,000 to avoid cut­ting half of programs

denkenberger16 Apr 2025 21:47 UTC
49 points
9 comments3 min readLW link

Pro­dromes and Bio­mark­ers in Chronic Disease

sarahconstantin16 Apr 2025 21:30 UTC
23 points
2 comments3 min readLW link
(sarahconstantin.substack.com)

The Prac­ti­cal Im­per­a­tive for AI Con­trol Re­search

Archana Vaidheeswaran16 Apr 2025 20:27 UTC
1 point
0 comments4 min readLW link

METR’s pre­limi­nary eval­u­a­tion of o3 and o4-mini

Christopher King16 Apr 2025 20:23 UTC
14 points
7 comments1 min readLW link
(metr.github.io)

Mass Ex­po­sure Paradox

max-sixty16 Apr 2025 20:18 UTC
6 points
2 comments2 min readLW link

GPT-4.5 is Cog­ni­tive Em­pa­thy, Son­net 3.5 is Affec­tive Empathy

Jack16 Apr 2025 19:12 UTC
15 points
2 comments4 min readLW link

GPT-4.1 Is a Mini Upgrade

Zvi16 Apr 2025 19:00 UTC
31 points
6 comments8 min readLW link
(thezvi.wordpress.com)

Do­ing Pri­ori­ti­za­tion Better

arvomm16 Apr 2025 18:46 UTC
3 points
1 comment19 min readLW link
(forum.effectivealtruism.org)

Kamelo: A Rule-Based Con­structed Lan­guage for Univer­sal, Log­i­cal Communication

Saif Khan16 Apr 2025 18:44 UTC
12 points
7 comments2 min readLW link

Un­der­stand­ing Trust: Overview Presentations

abramdemski16 Apr 2025 18:08 UTC
22 points
0 comments1 min readLW link

Un­der­stand­ing Trust—Overview Presentations

abramdemski16 Apr 2025 18:05 UTC
13 points
0 comments1 min readLW link

Telescoping

za3k16 Apr 2025 17:05 UTC
13 points
1 comment1 min readLW link
(blog.za3k.com)

Fi­nance and AI Timelines

DAL16 Apr 2025 16:55 UTC
5 points
2 comments3 min readLW link

FROM IA CODE TO HUMAN VALUES – A con­struc­tion from MaxEnt In­for­ma­tional Effi­ciency in 4 questions

P. João16 Apr 2025 16:53 UTC
3 points
0 comments7 min readLW link

AI-en­abled coups: a small group could use AI to seize power

16 Apr 2025 16:51 UTC
132 points
23 comments7 min readLW link

Ctrl-Z: Con­trol­ling AI Agents via Resampling

16 Apr 2025 16:21 UTC
124 points
0 comments20 min readLW link

Gam­ify life from BayesianMind

P. João16 Apr 2025 16:17 UTC
6 points
2 comments1 min readLW link

Top OpenAI Catas­trophic Risk Offi­cial Steps Down Abruptly

garrison16 Apr 2025 16:04 UTC
14 points
0 comments5 min readLW link
(garrisonlovely.substack.com)

An artis­tic illus­tra­tion of Scal­able Over­sight—“A world apart, nei­ther gods nor mor­tals”

Marius Adrian Nicoară16 Apr 2025 12:41 UTC
1 point
0 comments1 min readLW link

Can LLM-based mod­els do model-based plan­ning?

jylin0416 Apr 2025 12:38 UTC
11 points
1 comment2 min readLW link
(docs.google.com)

The road from hu­man-level to su­per­in­tel­li­gent AI may be short

16 Apr 2025 8:35 UTC
10 points
0 comments2 min readLW link
(aisafety.info)

Hu­man-level is not the limit

16 Apr 2025 8:33 UTC
23 points
2 comments2 min readLW link
(aisafety.info)

AI may at­tain hu­man-level soon

16 Apr 2025 8:28 UTC
11 points
0 comments2 min readLW link
(aisafety.info)

AI is ad­vanc­ing fast

16 Apr 2025 8:17 UTC
11 points
0 comments2 min readLW link
(aisafety.info)

How Logic “Really” Works: An Eng­ineer­ing Perspective

Daniil Strizhov16 Apr 2025 5:34 UTC
6 points
0 comments11 min readLW link

Op­por­tu­nity to to learn more about AI In­no­va­tion & Se­cu­rity Policy

PolicyTakes16 Apr 2025 1:35 UTC
2 points
0 comments1 min readLW link

D&D.Sci Tax Day: Ad­ven­tur­ers and Assessments

aphyer15 Apr 2025 23:43 UTC
47 points
14 comments2 min readLW link

Should AIs be En­couraged to Co­op­er­ate?

PeterMcCluskey15 Apr 2025 21:57 UTC
13 points
2 comments5 min readLW link
(bayesianinvestor.com)

OpenAI rewrote its Pre­pared­ness Framework

Zach Stein-Perlman15 Apr 2025 20:00 UTC
36 points
1 comment6 min readLW link

ASI ex­is­ten­tial risk: Re­con­sid­er­ing Align­ment as a Goal

habryka15 Apr 2025 19:57 UTC
93 points
14 comments19 min readLW link
(michaelnotebook.com)

Nu­cleic Acid Ob­ser­va­tory Up­dates, April 2025

jefftk15 Apr 2025 18:58 UTC
27 points
0 comments4 min readLW link
(naobservatory.org)

Some Othel­loGPT Circuits

Alfred Wong15 Apr 2025 18:41 UTC
7 points
0 comments7 min readLW link

The Mir­ror Prob­lem in AI: Why Lan­guage Models Say What­ever You Want

RobT15 Apr 2025 18:40 UTC
9 points
2 comments3 min readLW link

What hap­pens when LLMs learn new things? & Con­tinual learn­ing for­ever.

sunchipsster15 Apr 2025 18:38 UTC
4 points
1 comment7 min readLW link

To be leg­ible, ev­i­dence of mis­al­ign­ment prob­a­bly has to be behavioral

ryan_greenblatt15 Apr 2025 18:14 UTC
57 points
19 comments3 min readLW link

AISN #51: AI Frontiers

15 Apr 2025 16:01 UTC
8 points
1 comment5 min readLW link
(newsletter.safe.ai)

Sur­pris­ing LLM rea­son­ing failures make me think we still need qual­i­ta­tive break­throughs for AGI

Kaj_Sotala15 Apr 2025 15:56 UTC
174 points
52 comments18 min readLW link

OpenAI #13: Alt­man at TED and OpenAI Cut­ting Corners on Safety Testing

Zvi15 Apr 2025 15:30 UTC
48 points
3 comments12 min readLW link
(thezvi.wordpress.com)

The real rea­son AI bench­marks haven’t re­flected eco­nomic impacts

Noosphere8915 Apr 2025 13:44 UTC
15 points
0 comments1 min readLW link
(epoch.ai)

Map of AI Safety v2

15 Apr 2025 13:04 UTC
64 points
4 comments1 min readLW link

3M Sub­scriber YouTube Ac­count ‘Chan­nel 5’ Re­port­ing On Rationalism

sakraf15 Apr 2025 13:02 UTC
4 points
0 comments1 min readLW link
(youtu.be)

Can SAE steer­ing re­veal sand­bag­ging?

15 Apr 2025 12:33 UTC
35 points
3 comments4 min readLW link

Risers for Foot Percussion

jefftk15 Apr 2025 11:10 UTC
9 points
2 comments1 min readLW link
(www.jefftk.com)

What em­piri­cal re­search di­rec­tions has Eliezer com­mented pos­i­tively on?

Chris_Leong15 Apr 2025 8:53 UTC
8 points
1 comment1 min readLW link

Why Does It Feel Like Some­thing? An Evolu­tion­ary Path to Subjectivity

gmax15 Apr 2025 8:38 UTC
1 point
18 comments10 min readLW link

How to Defend the Indefensible

Alex Beyman15 Apr 2025 7:45 UTC
5 points
1 comment21 min readLW link

A Tal­mu­dic Ra­tion­al­ist Cau­tion­ary Tale

Noah Birnbaum15 Apr 2025 4:11 UTC
13 points
2 comments2 min readLW link

Creat­ing ‘Mak­ing God’: a Fea­ture Doc­u­men­tary on risks from AGI

Connor Axiotes15 Apr 2025 2:56 UTC
4 points
0 comments7 min readLW link

A Dissent on Honesty

eva_15 Apr 2025 2:43 UTC
44 points
52 comments14 min readLW link

$500 bounty for best short-form fic­tion about our near fu­ture world; $100 for recom­mend­ing win­ning piece: new “Art of Near Fu­ture World” quar­terly art project

Ramon Gonzalez15 Apr 2025 0:46 UTC
6 points
1 comment2 min readLW link