All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 212223 24 25 26 27 28 29 30

10 Principles for Real Alignment

Adriaan21 Apr 2025 22:18 UTC

−7 points

0 comments7 min readLW link

AE Studio is hiring!

Trent Hodgeson21 Apr 2025 20:35 UTC

20 points

2 comments2 min readLW link

$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

johnswentworth and David Lorell

21 Apr 2025 20:19 UTC

92 points

24 comments3 min readLW link

More Than Just A, T, C, and G: Screening for Hidden Dangers in DNA Sequences

sgd21 Apr 2025 20:12 UTC

1 point

0 comments11 min readLW link

The US Executive vs Supreme Court Deportations Clash

NunoSempere21 Apr 2025 19:56 UTC

44 points

12 comments7 min readLW link

(blog.sentinel-team.org)

Podcast on “AI tools for existential security” — transcript

Lizka and fin

21 Apr 2025 19:26 UTC

11 points

0 comments43 min readLW link

(pnc.st)

Implications for the likelihood of human extinction from the recent discovery of possible microbial life

Mvolz21 Apr 2025 19:15 UTC

1 point

2 comments1 min readLW link

Key event tracker for AI2027

MarkelKori21 Apr 2025 19:02 UTC

1 point

0 comments1 min readLW link

Load Bearing Magic

winstonBosan21 Apr 2025 18:53 UTC

8 points

2 comments3 min readLW link

The Uses of Complacency

sarahconstantin21 Apr 2025 18:50 UTC

99 points

5 comments8 min readLW link

(sarahconstantin.substack.com)

Feature-Based Analysis of Safety-Relevant Multi-Agent Behavior

Maria Kapros, Ana Kapros and Perusha Moodley

21 Apr 2025 18:12 UTC

10 points

0 comments5 min readLW link

Crime and Punishment #1

Zvi21 Apr 2025 15:30 UTC

41 points

10 comments39 min readLW link

(thezvi.wordpress.com)

Improving CNNs with Klein Networks: A Topological Approach to AI

Gunnar Carlsson21 Apr 2025 15:21 UTC

20 points

4 comments5 min readLW link

Eulogy to the Obits

Niko_McCarty and xanderbalwit

21 Apr 2025 14:10 UTC

5 points

1 comment10 min readLW link

Research Notes: Running Claude 3.7, Gemini 2.5 Pro, and o3 on Pokémon Red

Julian Bradshaw21 Apr 2025 3:52 UTC

124 points

20 comments14 min readLW link

Not All Beliefs Are Created Equal: Diagnosing Toxic Ideologies

Big_friendly_kiwi21 Apr 2025 3:18 UTC

23 points

7 comments9 min readLW link

AI 2027 is a Bet Against Amdahl’s Law

snewman21 Apr 2025 3:09 UTC

127 points

57 comments9 min readLW link

Severance and the Ethics of the Conscious Agents

Crissman21 Apr 2025 2:21 UTC

4 points

0 comments1 min readLW link

March-April 2025 Progress in Guaranteed Safe AI

Quinn20 Apr 2025 19:00 UTC

6 points

0 comments4 min readLW link

(gsai.substack.com)

How to end credentialism

Yair Halberstadt20 Apr 2025 18:50 UTC

13 points

15 comments8 min readLW link

Spending on Ourselves

jefftk20 Apr 2025 18:40 UTC

23 points

0 comments3 min readLW link

(www.jefftk.com)

Interesting ACX 2024 Book Review Entries

jenn20 Apr 2025 18:10 UTC

24 points

1 comment4 min readLW link

[Question] To what ethics is an AGI actually safely alignable?

StanislavKrym20 Apr 2025 17:09 UTC

1 point

6 comments4 min readLW link

Evaluating Oversight Robustness with Incentivized Reward Hacking

Yoav, Juan V, julianjm and McKennaFitzgerald

20 Apr 2025 16:53 UTC

7 points

2 comments15 min readLW link

Developing AI Safety: Bridging the Power-Ethics Gap (Introducing New Concepts)

Ronen Bar20 Apr 2025 4:40 UTC

3 points

0 comments5 min readLW link

(forum.effectivealtruism.org)

Is Gemini now better than Claude at Pokémon?

Julian Bradshaw19 Apr 2025 23:34 UTC

91 points

12 comments5 min readLW link

Impact, agency, and taste

benkuhn19 Apr 2025 21:10 UTC

206 points

10 comments8 min readLW link

(www.benkuhn.net)

Moral patienthood of simulated minds allows uncountabe infinity of value on finite hardware

Luck19 Apr 2025 20:41 UTC

2 points

12 comments2 min readLW link

When the Model Starts Talking Like Me: A User-Induced Structural Adaptation Case Study

Junxi19 Apr 2025 19:40 UTC

3 points

1 comment4 min readLW link

A Block-Based Regularization Proposal for Neural Networks

Otto.Dev19 Apr 2025 18:56 UTC

−8 points

0 comments1 min readLW link

How Close We Are to a Complete List of Imprinted Genes

Morpheus19 Apr 2025 18:37 UTC

30 points

3 comments14 min readLW link

(www.tassiloneubauer.com)

Novel Idea Generation in LLMs: Judgment as Bottleneck

Davey Morse19 Apr 2025 15:37 UTC

−2 points

1 comment1 min readLW link

An Introduction to SAEs and their Variants for Mech Interp

Adam Newgas19 Apr 2025 14:09 UTC

17 points

0 comments10 min readLW link

AI Advances and Detection Strategy

jefftk19 Apr 2025 11:40 UTC

11 points

0 comments1 min readLW link

(www.jefftk.com)

The System Didn’t, and Doesn’t Need to be This Way ~ Thomas Paine on Economic Justice

James Stephen Brown19 Apr 2025 5:16 UTC

2 points

3 comments4 min readLW link

(nonzerosum.games)

SecureDrop review

samuelshadrach19 Apr 2025 4:29 UTC

2 points

0 comments5 min readLW link

(samuelshadrach.com)

AI, Alignment & the Art of Relationship Design

Priyanka Bharadwaj19 Apr 2025 0:47 UTC

6 points

4 comments2 min readLW link

Measuring Beliefs of Language Models During Chain-of-Thought Reasoning

Baram Sosis and Tomáš Gavenčiak

18 Apr 2025 22:56 UTC

10 points

0 comments13 min readLW link

LLM-based Fact Checking for Popular Posts?

azergante18 Apr 2025 21:26 UTC

1 point

2 comments62 min readLW link

o3 Will Use Its Tools For You

Zvi18 Apr 2025 21:20 UTC

46 points

3 comments45 min readLW link

(thezvi.wordpress.com)

AI Control Methods Literature Review

Ram Potham18 Apr 2025 21:15 UTC

11 points

1 comment9 min readLW link

Consequentialists should have a comprehensive set of deontological beliefs they adhere to

Jay9518 Apr 2025 20:50 UTC

3 points

2 comments1 min readLW link

What Makes an AI Startup “Net Positive” for Safety?

jacquesthibs18 Apr 2025 20:33 UTC

82 points

23 comments2 min readLW link

Alignment Does Not Need to Be Opaque! An Introduction to Feature Steering with Reinforcement Learning

Jeremias Ferrao18 Apr 2025 19:34 UTC

10 points

0 comments10 min readLW link

Evaluating Collaborative AI Performance Subject to Sabotage

Matthew Khoriaty18 Apr 2025 19:33 UTC

2 points

0 comments19 min readLW link

Inside OpenAI’s Controversial Plan to Abandon its Nonprofit Roots

garrison18 Apr 2025 18:46 UTC

21 points

0 comments11 min readLW link

(garrisonlovely.substack.com)

Could LLMs Learn to Detect Bias Autonomously, Like Tesla’s Self-Driving Cars?

Omnipheasant18 Apr 2025 18:45 UTC

0 points

0 comments3 min readLW link

Scaffolding Skills

Screwtape18 Apr 2025 17:39 UTC

36 points

9 comments4 min readLW link

The Case for White Box Control

J Rosser18 Apr 2025 16:10 UTC

5 points

1 comment5 min readLW link

[Rockville] Rationalist Shabbat

maia18 Apr 2025 15:38 UTC

8 points

0 comments1 min readLW link