Kodo and Din

Screwtape26 Apr 2025 18:54 UTC
7 points
10 comments4 min readLW link

We should try to au­to­mate AI safety work asap

Marius Hobbhahn26 Apr 2025 16:35 UTC
113 points
10 comments15 min readLW link

AI Safety & En­trepreneur­ship v1.0

Chris_Leong26 Apr 2025 14:37 UTC
16 points
0 comments2 min readLW link

Re­con­sid­er­ing Money: The Case for Freigeld in the Digi­tal Age and a Net­worked Future

henophilia26 Apr 2025 12:54 UTC
−22 points
0 comments5 min readLW link
(blog.hermesloom.org)

How I Think About My Re­search Pro­cess: Ex­plore, Un­der­stand, Distill

Neel Nanda26 Apr 2025 10:31 UTC
56 points
4 comments8 min readLW link

Don’t you mean “the most *con­di­tion­ally* for­bid­den tech­nique?”

Knight Lee26 Apr 2025 3:45 UTC
14 points
0 comments3 min readLW link

Land with no aunties

thellimist26 Apr 2025 1:20 UTC
6 points
0 comments1 min readLW link
(kanyilmaz.me)

AI 2027 Thoughts

PeterMcCluskey26 Apr 2025 0:00 UTC
29 points
2 comments6 min readLW link
(bayesianinvestor.com)

Who’s Work­ing On It? AI-Con­trol­led Experiments

sarahconstantin25 Apr 2025 21:40 UTC
19 points
0 comments1 min readLW link
(sarahconstantin.substack.com)

[Linkpost] AI War seems un­likely to pre­vent AI Doom

thenoviceoof25 Apr 2025 20:44 UTC
7 points
6 comments2 min readLW link
(thenoviceoof.com)

Wor­ries About AI Are Usu­ally Com­ple­ments Not Substitutes

Zvi25 Apr 2025 20:00 UTC
45 points
3 comments4 min readLW link
(thezvi.wordpress.com)

Why would AI com­pa­nies use hu­man-level AI to do al­ign­ment re­search?

MichaelDickens25 Apr 2025 19:12 UTC
24 points
8 comments2 min readLW link

How Demo­cratic Is Effec­tive Altru­ism — Really?

B Jacobs25 Apr 2025 16:02 UTC
0 points
2 comments2 min readLW link
(bobjacobs.substack.com)

Will Pro­gram­mer Com­pen­sa­tion De­cou­ple from Pro­duc­tivity?

Gordon Seidoh Worley25 Apr 2025 15:32 UTC
15 points
7 comments2 min readLW link
(uncertainupdates.substack.com)

Zstd Win­dow Size

jefftk25 Apr 2025 14:40 UTC
12 points
1 comment2 min readLW link
(www.jefftk.com)

List of pe­ti­tions against OpenAI’s for-profit move

Remmelt25 Apr 2025 10:03 UTC
5 points
1 comment1 min readLW link

A re­view of “Why Did En­vi­ron­men­tal­ism Be­come Par­ti­san?”

David Scott Krueger (formerly: capybaralet)25 Apr 2025 5:12 UTC
24 points
0 comments4 min readLW link

LLM Pareto Fron­tier But Live

winstonBosan24 Apr 2025 21:22 UTC
8 points
0 comments1 min readLW link

Mod­ify­ing LLM Beliefs with Syn­thetic Doc­u­ment Finetuning

24 Apr 2025 21:15 UTC
70 points
12 comments2 min readLW link
(alignment.anthropic.com)

This prompt (some­times) makes ChatGPT think about ter­ror­ist organisations

jakub_krys24 Apr 2025 21:15 UTC
30 points
13 comments1 min readLW link

Se­vere con­trol over AI agents as a tool for mass-surveillance

Andrey Seryakov24 Apr 2025 20:27 UTC
2 points
0 comments3 min readLW link

To­ken and Taboo

Guive24 Apr 2025 20:17 UTC
31 points
6 comments4 min readLW link
(guive.substack.com)

Trou­ble at Min­ing­town: Prologue

Quinn24 Apr 2025 19:09 UTC
19 points
0 comments4 min readLW link

Train­ing-time schemers vs be­hav­ioral schemers

Alex Mallen24 Apr 2025 19:07 UTC
44 points
9 comments6 min readLW link

Re­ward hack­ing is be­com­ing more so­phis­ti­cated and de­liber­ate in fron­tier LLMs

Kei24 Apr 2025 16:03 UTC
95 points
6 comments1 min readLW link

Find­ing an Er­ror-De­tec­tion Fea­ture in Deep­Seek-R1

keith_wynroe24 Apr 2025 16:03 UTC
15 points
0 comments7 min readLW link

An­ti­ci­pat­ing AI: Keep­ing Up With What We Build

Alvin Ånestrand24 Apr 2025 15:23 UTC
2 points
0 comments11 min readLW link
(forecastingaifutures.substack.com)

Does Re­in­force­ment Learn­ing Really In­cen­tivize Rea­son­ing Ca­pac­ity in LLMs Beyond the Base Model?

Matrice Jacobine24 Apr 2025 14:11 UTC
12 points
4 comments1 min readLW link
(limit-of-rlvr.github.io)

Academia as a happy place?

24 Apr 2025 14:03 UTC
9 points
0 comments19 min readLW link

“The Era of Ex­pe­rience” has an un­solved tech­ni­cal al­ign­ment problem

Steven Byrnes24 Apr 2025 13:57 UTC
115 points
48 comments23 min readLW link

AI #113: The o3 Era Begins

Zvi24 Apr 2025 13:40 UTC
38 points
4 comments62 min readLW link
(thezvi.wordpress.com)

The In­tel­li­gence Curse: an es­say series

24 Apr 2025 12:59 UTC
72 points
10 comments2 min readLW link

Per­sonal eval­u­a­tion of LLMs, through chess

Karthik Tadepalli24 Apr 2025 7:01 UTC
20 points
4 comments2 min readLW link

In­tel­li­gence explosion

samuelshadrach24 Apr 2025 6:35 UTC
2 points
0 comments4 min readLW link
(samuelshadrach.com)

Cog­ni­tive Dis­so­nance is Men­tally Taxing

SorenJ24 Apr 2025 0:38 UTC
4 points
0 comments4 min readLW link

My Fa­vorite Pro­duc­tivity Blog Posts

Parker Conley24 Apr 2025 0:32 UTC
53 points
0 comments1 min readLW link
(parconley.com)

What Phys­i­cally Dist­in­guishes a Brain with False Beliefs Us­ing a Swim­ming Pool Example

YanLyutnev24 Apr 2025 0:01 UTC
6 points
0 comments7 min readLW link

OpenAI Alums, No­bel Lau­re­ates Urge Reg­u­la­tors to Save Com­pany’s Non­profit Structure

garrison23 Apr 2025 23:01 UTC
66 points
0 comments8 min readLW link
(garrisonlovely.substack.com)

What AI safety plans are there?

MichaelDickens23 Apr 2025 22:58 UTC
16 points
3 comments1 min readLW link

o3 Is a Ly­ing Liar

Zvi23 Apr 2025 20:00 UTC
84 points
26 comments9 min readLW link
(thezvi.wordpress.com)

Put­ting up Bumpers

Sam Bowman23 Apr 2025 16:05 UTC
54 points
14 comments2 min readLW link

The AI Belief-Con­sis­tency Letter

Knight Lee23 Apr 2025 12:01 UTC
−6 points
15 comments4 min readLW link

Jaan Tal­linn’s 2024 Philan­thropy Overview

jaan23 Apr 2025 11:06 UTC
227 points
8 comments1 min readLW link
(jaan.info)

[Question] Are we “be­ing poi­soned”?

Tigerlily23 Apr 2025 5:11 UTC
16 points
2 comments2 min readLW link

To Un­der­stand His­tory, Keep Former Pop­u­la­tion Distri­bu­tions In Mind

Arjun Panickssery23 Apr 2025 4:51 UTC
240 points
13 comments2 min readLW link
(arjunpanickssery.substack.com)

Fish and Faces

Eggs23 Apr 2025 3:35 UTC
8 points
6 comments2 min readLW link

Is al­ign­ment re­ducible to be­com­ing more co­her­ent?

Cole Wyeth22 Apr 2025 23:47 UTC
19 points
0 comments3 min readLW link

The EU Is Ask­ing for Feed­back on Fron­tier AI Reg­u­la­tion (Open to Global Ex­perts)—This Post Breaks Down What’s at Stake for AI Safety

Katalina Hernandez22 Apr 2025 20:39 UTC
62 points
13 comments9 min readLW link

Cor­rupted by Rea­son­ing: Rea­son­ing Lan­guage Models Be­come Free-Riders in Public Goods Games

22 Apr 2025 19:25 UTC
24 points
3 comments5 min readLW link

Align­ment from equiv­ar­i­ance II—lan­guage equiv­ar­i­ance as a way of figur­ing out what an AI “means”

hamishtodd122 Apr 2025 19:04 UTC
5 points
0 comments3 min readLW link