All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30

D&D.Sci Tax Day: Adventurers and Assessments

aphyer15 Apr 2025 23:43 UTC

47 points

14 comments2 min readLW link

Should AIs be Encouraged to Cooperate?

PeterMcCluskey15 Apr 2025 21:57 UTC

13 points

2 comments5 min readLW link

(bayesianinvestor.com)

OpenAI rewrote its Preparedness Framework

Zach Stein-Perlman15 Apr 2025 20:00 UTC

37 points

1 comment6 min readLW link

ASI existential risk: Reconsidering Alignment as a Goal

habryka15 Apr 2025 19:57 UTC

95 points

14 comments19 min readLW link

(michaelnotebook.com)

Nucleic Acid Observatory Updates, April 2025

jefftk15 Apr 2025 18:58 UTC

27 points

0 comments4 min readLW link

(naobservatory.org)

Some OthelloGPT Circuits

Alfred Wong15 Apr 2025 18:41 UTC

7 points

0 comments7 min readLW link

The Mirror Problem in AI: Why Language Models Say Whatever You Want

RobT15 Apr 2025 18:40 UTC

9 points

2 comments3 min readLW link

What happens when LLMs learn new things? & Continual learning forever.

sunchipsster15 Apr 2025 18:38 UTC

4 points

1 comment7 min readLW link

To be legible, evidence of misalignment probably has to be behavioral

ryan_greenblatt15 Apr 2025 18:14 UTC

58 points

19 comments3 min readLW link

AISN #51: AI Frontiers

Corin Katzke and Dan H

15 Apr 2025 16:01 UTC

8 points

1 comment5 min readLW link

(newsletter.safe.ai)

Surprising LLM reasoning failures make me think we still need qualitative breakthroughs for AGI

Kaj_Sotala15 Apr 2025 15:56 UTC

174 points

52 comments18 min readLW link

OpenAI #13: Altman at TED and OpenAI Cutting Corners on Safety Testing

Zvi15 Apr 2025 15:30 UTC

48 points

3 comments12 min readLW link

(thezvi.wordpress.com)

The real reason AI benchmarks haven’t reflected economic impacts

Noosphere8915 Apr 2025 13:44 UTC

15 points

0 comments1 min readLW link

(epoch.ai)

Map of AI Safety v2

Bryce Robertson, Søren Elverlin and honeybee

15 Apr 2025 13:04 UTC

64 points

4 comments1 min readLW link

3M Subscriber YouTube Account ‘Channel 5’ Reporting On Rationalism

sakraf15 Apr 2025 13:02 UTC

4 points

0 comments1 min readLW link

(youtu.be)

Can SAE steering reveal sandbagging?

jordinne, Hoang Khiem, Felix Hofstätter and Cleo Nardo

15 Apr 2025 12:33 UTC

36 points

3 comments4 min readLW link

Risers for Foot Percussion

jefftk15 Apr 2025 11:10 UTC

9 points

2 comments1 min readLW link

(www.jefftk.com)

What empirical research directions has Eliezer commented positively on?

Chris_Leong15 Apr 2025 8:53 UTC

8 points

1 comment1 min readLW link

Why Does It Feel Like Something? An Evolutionary Path to Subjectivity

gmax15 Apr 2025 8:38 UTC

1 point

18 comments10 min readLW link

How to Defend the Indefensible

Alex Beyman15 Apr 2025 7:45 UTC

5 points

1 comment25 min readLW link

A Talmudic Rationalist Cautionary Tale

Noah Birnbaum15 Apr 2025 4:11 UTC

13 points

2 comments2 min readLW link

Creating ‘Making God’: a Feature Documentary on risks from AGI

Connor Axiotes15 Apr 2025 2:56 UTC

4 points

0 comments7 min readLW link

A Dissent on Honesty

eva_15 Apr 2025 2:43 UTC

48 points

54 comments14 min readLW link

$500 bounty for best short-form fiction about our near future world; $100 for recommending winning piece: new “Art of Near Future World” quarterly art project

Ramon Gonzalez15 Apr 2025 0:46 UTC

6 points

1 comment2 min readLW link

What if there was a nuke in Manhattan and why that could be a good thing

Ratburn15 Apr 2025 0:19 UTC

3 points

11 comments3 min readLW link

Nihilism Is Not Enough By Peter Thiel

shawkisukkar15 Apr 2025 0:13 UTC

6 points

4 comments1 min readLW link

(www.nihilismisnotenough.com)

Correcting Deceptive Alignment using a Deontological Approach

JeaniceK14 Apr 2025 22:07 UTC

9 points

0 comments7 min readLW link

Religious Persistence: A Missing Primitive for Robust Alignment

lauriewired14 Apr 2025 22:03 UTC

6 points

3 comments8 min readLW link

The 4-Minute Mile Effect

Parker Conley14 Apr 2025 21:41 UTC

32 points

6 comments2 min readLW link

(parconley.com)

Lightning Talks!

nathandunkerley14 Apr 2025 20:39 UTC

1 point

0 comments1 min readLW link

The Bell Curve of Bad Behavior

Screwtape14 Apr 2025 19:58 UTC

59 points

6 comments10 min readLW link

Sentinel’s Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.

NunoSempere14 Apr 2025 19:11 UTC

42 points

0 comments2 min readLW link

(blog.sentinel-team.org)

Sam Altman’s sister claims Sam sexually abused her—Part 7: Timeline, continued

pythagoras501514 Apr 2025 17:43 UTC

2 points

0 comments36 min readLW link

Sam Altman’s sister claims Sam sexually abused her—Part 8: Timeline, continued

pythagoras501514 Apr 2025 17:42 UTC

4 points

0 comments71 min readLW link

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Adam Karvonen14 Apr 2025 17:38 UTC

165 points

43 comments7 min readLW link

(adamkarvonen.github.io)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence

Tomek Korbak, Mikita Balesni, Buck and Geoffrey Irving

14 Apr 2025 16:45 UTC

29 points

1 comment2 min readLW link

Applications Open for Impact Accelerator Program for Experienced Professionals

High Impact Professionals14 Apr 2025 16:27 UTC

1 point

0 comments3 min readLW link

The Last Light

Bridgett Kay14 Apr 2025 15:41 UTC

31 points

2 comments4 min readLW link

Offer: Team Conflict Counseling for AI Safety Orgs

Severin T. Seehrich14 Apr 2025 15:17 UTC

19 points

1 comment1 min readLW link

Slopworld 2035: The dangers of mediocre AI

titotal14 Apr 2025 13:14 UTC

22 points

6 comments29 min readLW link

(titotal.substack.com)

Try training token-level probes

StefanHex14 Apr 2025 11:56 UTC

47 points

6 comments8 min readLW link

Monthly Roundup #29: April 2025

Zvi14 Apr 2025 11:50 UTC

23 points

7 comments24 min readLW link

(thezvi.wordpress.com)

A Solution to Sandbagging and other Self-Provable Misalignment: Constitutional AI Detectives

Knight Lee14 Apr 2025 10:27 UTC

−3 points

2 comments4 min readLW link

One-shot steering vectors cause emergent misalignment, too

Jacob Dunefsky14 Apr 2025 6:40 UTC

99 points

6 comments11 min readLW link

Unbendable Arm as Test Case for Religious Belief

Ivan Vendrov14 Apr 2025 1:57 UTC

28 points

39 comments2 min readLW link

(nothinghuman.substack.com)

Sam Altman’s sister claims Sam sexually abused her—Part 5: Timeline, continued

pythagoras501514 Apr 2025 1:00 UTC

1 point

0 comments125 min readLW link

Луна Лавгуд и Комната Тайн, Часть 5

Kongo Landwalker and lsusr

14 Apr 2025 0:10 UTC

4 points

0 comments3 min readLW link

Sam Altman’s sister claims Sam sexually abused her—Part 4: Timeline, continued

pythagoras501513 Apr 2025 23:41 UTC

1 point

0 comments51 min readLW link

The Structure of the Pain of Change

ReverendBayes13 Apr 2025 21:51 UTC

7 points

0 comments10 min readLW link

Луна Лавгуд и Комната Тайн, Часть 4

Kongo Landwalker and lsusr

13 Apr 2025 20:55 UTC

3 points

0 comments4 min readLW link