All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb MarAprMay Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 242526 27 28 29 30

LLM Pareto Frontier But Live

winstonBosan24 Apr 2025 21:22 UTC

8 points

0 comments1 min readLW link

Modifying LLM Beliefs with Synthetic Document Finetuning

RowanWang, Johannes Treutlein, Avery, Ethan Perez, Fabien Roger and Sam Marks

24 Apr 2025 21:15 UTC

77 points

12 comments2 min readLW link

(alignment.anthropic.com)

This prompt (sometimes) makes ChatGPT think about terrorist organisations

jakub_krys24 Apr 2025 21:15 UTC

30 points

13 comments1 min readLW link

Severe control over AI agents as a tool for mass-surveillance

Andrey Seryakov24 Apr 2025 20:27 UTC

2 points

0 comments3 min readLW link

Token and Taboo

Guive24 Apr 2025 20:17 UTC

31 points

6 comments4 min readLW link

(guive.substack.com)

Trouble at Miningtown: Prologue

Quinn24 Apr 2025 19:09 UTC

19 points

0 comments4 min readLW link

Training-time schemers vs behavioral schemers

Alex Mallen24 Apr 2025 19:07 UTC

64 points

9 comments6 min readLW link

Reward hacking is becoming more sophisticated and deliberate in frontier LLMs

Kei Nishimura-Gasparian24 Apr 2025 16:03 UTC

97 points

7 comments1 min readLW link

Finding an Error-Detection Feature in DeepSeek-R1

keith_wynroe24 Apr 2025 16:03 UTC

23 points

0 comments7 min readLW link

Anticipating AI: Keeping Up With What We Build

Alvin Ånestrand24 Apr 2025 15:23 UTC

2 points

0 comments11 min readLW link

(forecastingaifutures.substack.com)

Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?

Matrice Jacobine24 Apr 2025 14:11 UTC

12 points

4 comments1 min readLW link

(limit-of-rlvr.github.io)

Academia as a happy place?

jow and pchvykov

24 Apr 2025 14:03 UTC

9 points

0 comments19 min readLW link

“The Era of Experience” has an unsolved technical alignment problem

Steven Byrnes24 Apr 2025 13:57 UTC

116 points

48 comments23 min readLW link

AI #113: The o3 Era Begins

Zvi24 Apr 2025 13:40 UTC

38 points

4 comments62 min readLW link

(thezvi.wordpress.com)

The Intelligence Curse: an essay series

L Rudolf L and lukedrago

24 Apr 2025 12:59 UTC

85 points

10 comments2 min readLW link

Personal evaluation of LLMs, through chess

Karthik Tadepalli24 Apr 2025 7:01 UTC

20 points

4 comments2 min readLW link

Intelligence explosion

samuelshadrach24 Apr 2025 6:35 UTC

2 points

0 comments4 min readLW link

(samuelshadrach.com)

Cognitive Dissonance is Mentally Taxing

SorenJ24 Apr 2025 0:38 UTC

4 points

0 comments4 min readLW link

My Favorite Productivity Blog Posts

Parker Conley24 Apr 2025 0:32 UTC

56 points

0 comments1 min readLW link

(parconley.com)

What Physically Distinguishes a Brain with False Beliefs Using a Swimming Pool Example

YanLyutnev24 Apr 2025 0:01 UTC

6 points

0 comments7 min readLW link

OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure

garrison23 Apr 2025 23:01 UTC

66 points

0 comments8 min readLW link

(garrisonlovely.substack.com)

What AI safety plans are there?

MichaelDickens23 Apr 2025 22:58 UTC

18 points

4 comments1 min readLW link

o3 Is a Lying Liar

Zvi23 Apr 2025 20:00 UTC

86 points

26 comments9 min readLW link

(thezvi.wordpress.com)

Putting up Bumpers

Sam Bowman23 Apr 2025 16:05 UTC

58 points

14 comments2 min readLW link

The AI Belief-Consistency Letter

Knight Lee23 Apr 2025 12:01 UTC

−6 points

15 comments4 min readLW link

Jaan Tallinn’s 2024 Philanthropy Overview

jaan23 Apr 2025 11:06 UTC

228 points

8 comments1 min readLW link

(jaan.info)

[Question] Are we “being poisoned”?

Tigerlily23 Apr 2025 5:11 UTC

16 points

2 comments2 min readLW link

To Understand History, Keep Former Population Distributions In Mind

Arjun Panickssery23 Apr 2025 4:51 UTC

254 points

13 comments2 min readLW link

(arjunpanickssery.substack.com)

Fish and Faces

Eggs23 Apr 2025 3:35 UTC

8 points

6 comments2 min readLW link

Is alignment reducible to becoming more coherent?

Cole Wyeth22 Apr 2025 23:47 UTC

19 points

0 comments3 min readLW link

The EU Is Asking for Feedback on Frontier AI Regulation (Open to Global Experts)—This Post Breaks Down What’s at Stake for AI Safety

Katalina Hernandez22 Apr 2025 20:39 UTC

62 points

13 comments9 min readLW link

Corrupted by Reasoning: Reasoning Language Models Become Free-Riders in Public Goods Games

David Guzman Piedrahita, Yongjin Yang and Zhijing Jin

22 Apr 2025 19:25 UTC

24 points

3 comments5 min readLW link

Alignment from equivariance II—language equivariance as a way of figuring out what an AI “means”

hamishtodd122 Apr 2025 19:04 UTC

5 points

1 comment3 min readLW link

Manifund 2025 Regrants

Austin Chen22 Apr 2025 17:36 UTC

21 points

0 comments5 min readLW link

(manifund.substack.com)

AISN#52: An Expert Virology Benchmark

Corin Katzke and Dan H

22 Apr 2025 17:08 UTC

6 points

0 comments4 min readLW link

(newsletter.safe.ai)

Intuition in AI

Priyanka Bharadwaj22 Apr 2025 15:15 UTC

0 points

2 comments2 min readLW link

Problems with Bayesianism: A Socratic Dialogue

B Jacobs22 Apr 2025 14:09 UTC

3 points

1 comment14 min readLW link

(bobjacobs.substack.com)

Societal and technological progress as sewing an ever-growing, ever-changing, patchy, and polychrome quilt

Joel Z. Leibo, Wilcunningham, Seb Krier and Manfred Diaz

22 Apr 2025 13:21 UTC

51 points

24 comments25 min readLW link

You Better Mechanize

Zvi22 Apr 2025 13:10 UTC

76 points

6 comments20 min readLW link

(thezvi.wordpress.com)

Experimental testing: can I treat myself as a random sample?

avturchin22 Apr 2025 12:34 UTC

9 points

41 comments4 min readLW link

Family-line selection optimizer

lemonhope22 Apr 2025 7:16 UTC

2 points

0 comments1 min readLW link

Accountability Sinks

Martin Sustrik22 Apr 2025 5:00 UTC

462 points

59 comments15 min readLW link

(250bpm.substack.com)

Most AI value will come from broad automation, not from R&D

Matthew Barnett22 Apr 2025 3:22 UTC

10 points

6 comments2 min readLW link

(epoch.ai)

A Letter to His Highness Louis XV, the King of France

testingthewaters22 Apr 2025 0:51 UTC

2 points

0 comments1 min readLW link

(aclevername.substack.com)

10 Principles for Real Alignment

Adriaan21 Apr 2025 22:18 UTC

−7 points

0 comments7 min readLW link

AE Studio is hiring!

Trent Hodgeson21 Apr 2025 20:35 UTC

20 points

2 comments2 min readLW link

$500 Bounty Problem: Are (Approximately) Deterministic Natural Latents All You Need?

johnswentworth and David Lorell

21 Apr 2025 20:19 UTC

93 points

24 comments3 min readLW link

More Than Just A, T, C, and G: Screening for Hidden Dangers in DNA Sequences

sgd21 Apr 2025 20:12 UTC

1 point

0 comments11 min readLW link

The US Executive vs Supreme Court Deportations Clash

NunoSempere21 Apr 2025 19:56 UTC

44 points

12 comments7 min readLW link

(blog.sentinel-team.org)

Podcast on “AI tools for existential security” — transcript

Lizka and fin

21 Apr 2025 19:26 UTC

11 points

0 comments43 min readLW link

(pnc.st)