All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 141516 17 18 19 20 21 22 23 24 25 26 27 28 29 30

Correcting Deceptive Alignment using a Deontological Approach

JeaniceK14 Apr 2025 22:07 UTC

8 points

0 comments7 min readLW link

Religious Persistence: A Missing Primitive for Robust Alignment

lauriewired14 Apr 2025 22:03 UTC

6 points

3 comments8 min readLW link

The 4-Minute Mile Effect

Parker Conley14 Apr 2025 21:41 UTC

32 points

6 comments2 min readLW link

(parconley.com)

Lightning Talks!

nathandunkerley14 Apr 2025 20:39 UTC

1 point

0 comments1 min readLW link

The Bell Curve of Bad Behavior

Screwtape14 Apr 2025 19:58 UTC

57 points

6 comments10 min readLW link

Sentinel’s Global Risks Weekly Roundup #15/2025: Tariff yoyo, OpenAI slashing safety testing, Iran nuclear programme negotiations, 1K H5N1 confirmed herd infections.

NunoSempere14 Apr 2025 19:11 UTC

42 points

0 comments2 min readLW link

(blog.sentinel-team.org)

Sam Altman’s sister claims Sam sexually abused her—Part 7: Timeline, continued

pythagoras501514 Apr 2025 17:43 UTC

2 points

0 comments36 min readLW link

Sam Altman’s sister claims Sam sexually abused her—Part 8: Timeline, continued

pythagoras501514 Apr 2025 17:42 UTC

4 points

0 comments71 min readLW link

Frontier AI Models Still Fail at Basic Physical Tasks: A Manufacturing Case Study

Adam Karvonen14 Apr 2025 17:38 UTC

158 points

42 comments7 min readLW link

(adamkarvonen.github.io)

How to evaluate control measures for LLM agents? A trajectory from today to superintelligence

Tomek Korbak, Mikita Balesni, Buck and Geoffrey Irving

14 Apr 2025 16:45 UTC

29 points

1 comment2 min readLW link

Applications Open for Impact Accelerator Program for Experienced Professionals

Clark Wisenbaker14 Apr 2025 16:27 UTC

1 point

0 comments3 min readLW link

The Last Light

Bridgett Kay14 Apr 2025 15:41 UTC

31 points

2 comments4 min readLW link

Offer: Team Conflict Counseling for AI Safety Orgs

Severin T. Seehrich14 Apr 2025 15:17 UTC

19 points

1 comment1 min readLW link

Slopworld 2035: The dangers of mediocre AI

titotal14 Apr 2025 13:14 UTC

22 points

6 comments29 min readLW link

(titotal.substack.com)

Try training token-level probes

StefanHex14 Apr 2025 11:56 UTC

47 points

6 comments8 min readLW link

Monthly Roundup #29: April 2025

Zvi14 Apr 2025 11:50 UTC

23 points

7 comments24 min readLW link

(thezvi.wordpress.com)

A Solution to Sandbagging and other Self-Provable Misalignment: Constitutional AI Detectives

Knight Lee14 Apr 2025 10:27 UTC

−3 points

2 comments4 min readLW link

One-shot steering vectors cause emergent misalignment, too

Jacob Dunefsky14 Apr 2025 6:40 UTC

98 points

6 comments11 min readLW link

Unbendable Arm as Test Case for Religious Belief

Ivan Vendrov14 Apr 2025 1:57 UTC

28 points

39 comments2 min readLW link

(nothinghuman.substack.com)

Sam Altman’s sister claims Sam sexually abused her—Part 5: Timeline, continued

pythagoras501514 Apr 2025 1:00 UTC

1 point

0 comments125 min readLW link

Луна Лавгуд и Комната Тайн, Часть 5

Kongo Landwalker and lsusr

14 Apr 2025 0:10 UTC

4 points

0 comments3 min readLW link

Sam Altman’s sister claims Sam sexually abused her—Part 4: Timeline, continued

pythagoras501513 Apr 2025 23:41 UTC

1 point

0 comments51 min readLW link

The Structure of the Pain of Change

ReverendBayes13 Apr 2025 21:51 UTC

7 points

0 comments10 min readLW link

Луна Лавгуд и Комната Тайн, Часть 4

Kongo Landwalker and lsusr

13 Apr 2025 20:55 UTC

3 points

0 comments4 min readLW link

Thoughts on the Double Impact Project

Mati_Roy13 Apr 2025 19:07 UTC

27 points

14 comments2 min readLW link

Intro to Multi-Agent Safety

james__p13 Apr 2025 17:40 UTC

12 points

0 comments5 min readLW link

Vestigial reasoning in RL

Caleb Biddulph13 Apr 2025 15:40 UTC

54 points

8 comments9 min readLW link

Four Types of Disagreement

silentbob13 Apr 2025 11:22 UTC

50 points

4 comments5 min readLW link

How I switched careers from software engineer to AI policy operations

Lucie Philippon13 Apr 2025 6:37 UTC

58 points

1 comment5 min readLW link

Steelmanning heuristic arguments

Dmitry Vaintrob13 Apr 2025 1:09 UTC

77 points

0 comments17 min readLW link

MONA: Three Month Later—Updates and Steganography Without Optimization Pressure

David Lindner and Vikrant Varma

12 Apr 2025 23:15 UTC

31 points

0 comments5 min readLW link

The Era of the Dividual—are we falling apart?

James Stephen Brown12 Apr 2025 22:35 UTC

3 points

2 comments4 min readLW link

Commitment Races are a technical problem ASI can easily solve

Knight Lee12 Apr 2025 22:22 UTC

7 points

6 comments6 min readLW link

The King’s Gift: How Institutions Rebrand Responsibility into Illusion

Hu Yichao12 Apr 2025 19:38 UTC

1 point

0 comments1 min readLW link

Experts have it easy

beyarkay12 Apr 2025 19:32 UTC

23 points

3 comments9 min readLW link

find_purpose.exe

heatdeathandtaxes12 Apr 2025 19:31 UTC

−1 points

0 comments5 min readLW link

(heatdeathandtaxes.substack.com)

The Cynic Wasps in the Beehive

mempko12 Apr 2025 19:30 UTC

−3 points

0 comments1 min readLW link

(blog.mempko.com)

Луна Лавгуд и Комната Тайн, Часть 3

Kongo Landwalker and lsusr

12 Apr 2025 19:20 UTC

3 points

0 comments2 min readLW link

[Question] What is autism?

Adam Zerner12 Apr 2025 18:12 UTC

18 points

7 comments1 min readLW link

College Advice For People Like Me

henryj12 Apr 2025 14:36 UTC

54 points

5 comments17 min readLW link

(www.henryjosephson.com)

Why does LW not put much more focus on AI governance and outreach?

Severin T. Seehrich and Benjamin Schmidt

12 Apr 2025 14:24 UTC

78 points

31 comments2 min readLW link

What are good safety standards for open source AIs from China?

ChristianKl12 Apr 2025 13:06 UTC

10 points

2 comments1 min readLW link

Will US tariffs push data centers for large model training offshore?

ChristianKl12 Apr 2025 12:47 UTC

20 points

3 comments1 min readLW link

Self propagating story.

Canaletto12 Apr 2025 12:32 UTC

3 points

0 comments8 min readLW link

Calling Bullshit—the Cheatsheet

Niklas Lehmann12 Apr 2025 11:43 UTC

13 points

5 comments2 min readLW link

The Internal Model Principle: A Straightforward Explanation

Alfred Harwood12 Apr 2025 10:58 UTC

23 points

6 comments19 min readLW link

ACX Spring Meetup 2025 @ Klang Valley, Malaysia

Yi-Yang12 Apr 2025 7:31 UTC

2 points

0 comments1 min readLW link

Distributed whistleblowing

samuelshadrach12 Apr 2025 6:36 UTC

5 points

5 comments4 min readLW link

(samuelshadrach.com)

[Question] How likely are the USA to decay and how will it influence the AI development?

StanislavKrym12 Apr 2025 4:42 UTC

10 points

0 comments1 min readLW link

[Question] Does this game have a name?

Mis-Understandings12 Apr 2025 1:52 UTC

4 points

4 comments1 min readLW link