All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan Feb MarAprMay Jun Jul Aug Sep Oct

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 262728 29 30

Kodo and Din

Screwtape26 Apr 2025 18:54 UTC

7 points

10 comments4 min readLW link

We should try to automate AI safety work asap

Marius Hobbhahn26 Apr 2025 16:35 UTC

113 points

10 comments15 min readLW link

AI Safety & Entrepreneurship v1.0

Chris_Leong26 Apr 2025 14:37 UTC

16 points

0 comments2 min readLW link

Reconsidering Money: The Case for Freigeld in the Digital Age and a Networked Future

henophilia26 Apr 2025 12:54 UTC

−22 points

0 comments5 min readLW link

(blog.hermesloom.org)

How I Think About My Research Process: Explore, Understand, Distill

Neel Nanda26 Apr 2025 10:31 UTC

56 points

4 comments8 min readLW link

Don’t you mean “the most conditionally forbidden technique?”

Knight Lee26 Apr 2025 3:45 UTC

14 points

0 comments3 min readLW link

Land with no aunties

thellimist26 Apr 2025 1:20 UTC

6 points

0 comments1 min readLW link

(kanyilmaz.me)

AI 2027 Thoughts

PeterMcCluskey26 Apr 2025 0:00 UTC

29 points

2 comments6 min readLW link

(bayesianinvestor.com)

Who’s Working On It? AI-Controlled Experiments

sarahconstantin25 Apr 2025 21:40 UTC

19 points

0 comments1 min readLW link

(sarahconstantin.substack.com)

[Linkpost] AI War seems unlikely to prevent AI Doom

thenoviceoof25 Apr 2025 20:44 UTC

7 points

6 comments2 min readLW link

(thenoviceoof.com)

Worries About AI Are Usually Complements Not Substitutes

Zvi25 Apr 2025 20:00 UTC

45 points

3 comments4 min readLW link

(thezvi.wordpress.com)

Why would AI companies use human-level AI to do alignment research?

MichaelDickens25 Apr 2025 19:12 UTC

24 points

8 comments2 min readLW link

How Democratic Is Effective Altruism — Really?

B Jacobs25 Apr 2025 16:02 UTC

0 points

2 comments2 min readLW link

(bobjacobs.substack.com)

Will Programmer Compensation Decouple from Productivity?

Gordon Seidoh Worley25 Apr 2025 15:32 UTC

15 points

7 comments2 min readLW link

(uncertainupdates.substack.com)

Zstd Window Size

jefftk25 Apr 2025 14:40 UTC

12 points

1 comment2 min readLW link

(www.jefftk.com)

List of petitions against OpenAI’s for-profit move

Remmelt25 Apr 2025 10:03 UTC

5 points

1 comment1 min readLW link

A review of “Why Did Environmentalism Become Partisan?”

David Scott Krueger (formerly: capybaralet)25 Apr 2025 5:12 UTC

24 points

0 comments4 min readLW link

LLM Pareto Frontier But Live

winstonBosan24 Apr 2025 21:22 UTC

8 points

0 comments1 min readLW link

Modifying LLM Beliefs with Synthetic Document Finetuning

RowanWang, Johannes Treutlein, Avery, Ethan Perez, Fabien Roger and Sam Marks

24 Apr 2025 21:15 UTC

70 points

12 comments2 min readLW link

(alignment.anthropic.com)

This prompt (sometimes) makes ChatGPT think about terrorist organisations

jakub_krys24 Apr 2025 21:15 UTC

30 points

13 comments1 min readLW link

Severe control over AI agents as a tool for mass-surveillance

Andrey Seryakov24 Apr 2025 20:27 UTC

2 points

0 comments3 min readLW link

Token and Taboo

Guive24 Apr 2025 20:17 UTC

31 points

6 comments4 min readLW link

(guive.substack.com)

Trouble at Miningtown: Prologue

Quinn24 Apr 2025 19:09 UTC

19 points

0 comments4 min readLW link

Training-time schemers vs behavioral schemers

Alex Mallen24 Apr 2025 19:07 UTC

44 points

9 comments6 min readLW link

Reward hacking is becoming more sophisticated and deliberate in frontier LLMs

Kei24 Apr 2025 16:03 UTC

95 points

6 comments1 min readLW link

Finding an Error-Detection Feature in DeepSeek-R1

keith_wynroe24 Apr 2025 16:03 UTC

15 points

0 comments7 min readLW link

Anticipating AI: Keeping Up With What We Build

Alvin Ånestrand24 Apr 2025 15:23 UTC

2 points

0 comments11 min readLW link

(forecastingaifutures.substack.com)

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Matrice Jacobine24 Apr 2025 14:11 UTC

12 points

4 comments1 min readLW link

(limit-of-rlvr.github.io)

Academia as a happy place?

jow and pchvykov

24 Apr 2025 14:03 UTC

9 points

0 comments19 min readLW link

“The Era of Experience” has an unsolved technical alignment problem

Steven Byrnes24 Apr 2025 13:57 UTC

115 points

48 comments23 min readLW link

AI #113: The o3 Era Begins

Zvi24 Apr 2025 13:40 UTC

38 points

4 comments62 min readLW link

(thezvi.wordpress.com)

The Intelligence Curse: an essay series

L Rudolf L and lukedrago

24 Apr 2025 12:59 UTC

72 points

10 comments2 min readLW link

Personal evaluation of LLMs, through chess

Karthik Tadepalli24 Apr 2025 7:01 UTC

20 points

4 comments2 min readLW link

Intelligence explosion

samuelshadrach24 Apr 2025 6:35 UTC

2 points

0 comments4 min readLW link

(samuelshadrach.com)

Cognitive Dissonance is Mentally Taxing

SorenJ24 Apr 2025 0:38 UTC

4 points

0 comments4 min readLW link

My Favorite Productivity Blog Posts

Parker Conley24 Apr 2025 0:32 UTC

53 points

0 comments1 min readLW link

(parconley.com)

What Physically Distinguishes a Brain with False Beliefs Using a Swimming Pool Example

YanLyutnev24 Apr 2025 0:01 UTC

6 points

0 comments7 min readLW link

OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure

garrison23 Apr 2025 23:01 UTC

66 points

0 comments8 min readLW link

(garrisonlovely.substack.com)

What AI safety plans are there?

MichaelDickens23 Apr 2025 22:58 UTC

16 points

3 comments1 min readLW link

o3 Is a Lying Liar

Zvi23 Apr 2025 20:00 UTC

84 points

26 comments9 min readLW link

(thezvi.wordpress.com)

Putting up Bumpers

Sam Bowman23 Apr 2025 16:05 UTC

54 points

14 comments2 min readLW link

The AI Belief-Consistency Letter

Knight Lee23 Apr 2025 12:01 UTC

−6 points

15 comments4 min readLW link

Jaan Tallinn’s 2024 Philanthropy Overview

jaan23 Apr 2025 11:06 UTC

227 points

8 comments1 min readLW link

(jaan.info)

[Question] Are we “being poisoned”?

Tigerlily23 Apr 2025 5:11 UTC

16 points

2 comments2 min readLW link

To Understand History, Keep Former Population Distributions In Mind

Arjun Panickssery23 Apr 2025 4:51 UTC

240 points

13 comments2 min readLW link

(arjunpanickssery.substack.com)

Fish and Faces

Eggs23 Apr 2025 3:35 UTC

8 points

6 comments2 min readLW link

Is alignment reducible to becoming more coherent?

Cole Wyeth22 Apr 2025 23:47 UTC

19 points

0 comments3 min readLW link

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

Katalina Hernandez22 Apr 2025 20:39 UTC

62 points

13 comments9 min readLW link

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang and Zhijing Jin

22 Apr 2025 19:25 UTC

24 points

3 comments5 min readLW link

Alignment from equivariance II—language equivariance as a way of figuring out what an AI “means”

hamishtodd122 Apr 2025 19:04 UTC

5 points

0 comments3 min readLW link