LLM Pareto Fron­tier But Live

winstonBosanApr 24, 2025, 9:22 PM
8 points
0 comments1 min readLW link

Mod­ify­ing LLM Beliefs with Syn­thetic Doc­u­ment Finetuning

Apr 24, 2025, 9:15 PM
70 points
12 comments2 min readLW link
(alignment.anthropic.com)

This prompt (some­times) makes ChatGPT think about ter­ror­ist organisations

jakub_krysApr 24, 2025, 9:15 PM
30 points
13 comments1 min readLW link

Se­vere con­trol over AI agents as a tool for mass-surveillance

Andrey SeryakovApr 24, 2025, 8:27 PM
2 points
0 comments3 min readLW link

To­ken and Taboo

GuiveApr 24, 2025, 8:17 PM
31 points
6 comments4 min readLW link
(guive.substack.com)

Trou­ble at Min­ing­town: Prologue

QuinnApr 24, 2025, 7:09 PM
19 points
0 comments4 min readLW link

Train­ing-time schemers vs be­hav­ioral schemers

Alex MallenApr 24, 2025, 7:07 PM
44 points
9 comments6 min readLW link

Re­ward hack­ing is be­com­ing more so­phis­ti­cated and de­liber­ate in fron­tier LLMs

KeiApr 24, 2025, 4:03 PM
94 points
6 comments1 min readLW link

Find­ing an Er­ror-De­tec­tion Fea­ture in Deep­Seek-R1

keith_wynroeApr 24, 2025, 4:03 PM
15 points
0 comments7 min readLW link

An­ti­ci­pat­ing AI: Keep­ing Up With What We Build

Alvin ÅnestrandApr 24, 2025, 3:23 PM
2 points
0 comments11 min readLW link
(forecastingaifutures.substack.com)

Does Re­in­force­ment Learn­ing Really In­cen­tivize Rea­son­ing Ca­pac­ity in LLMs Beyond the Base Model?

Matrice JacobineApr 24, 2025, 2:11 PM
12 points
4 comments1 min readLW link
(limit-of-rlvr.github.io)

Academia as a happy place?

Apr 24, 2025, 2:03 PM
9 points
0 comments19 min readLW link

“The Era of Ex­pe­rience” has an un­solved tech­ni­cal al­ign­ment problem

Steven ByrnesApr 24, 2025, 1:57 PM
115 points
48 comments23 min readLW link

AI #113: The o3 Era Begins

ZviApr 24, 2025, 1:40 PM
38 points
4 comments62 min readLW link
(thezvi.wordpress.com)

The In­tel­li­gence Curse: an es­say series

Apr 24, 2025, 12:59 PM
68 points
10 comments2 min readLW link

Per­sonal eval­u­a­tion of LLMs, through chess

Karthik TadepalliApr 24, 2025, 7:01 AM
20 points
4 comments2 min readLW link

In­tel­li­gence explosion

samuelshadrachApr 24, 2025, 6:35 AM
2 points
0 comments4 min readLW link
(samuelshadrach.com)

Cog­ni­tive Dis­so­nance is Men­tally Taxing

SorenJApr 24, 2025, 12:38 AM
4 points
0 comments4 min readLW link

My Fa­vorite Pro­duc­tivity Blog Posts

Parker ConleyApr 24, 2025, 12:32 AM
53 points
0 comments1 min readLW link
(parconley.com)

What Phys­i­cally Dist­in­guishes a Brain with False Beliefs Us­ing a Swim­ming Pool Example

YanLyutnevApr 24, 2025, 12:01 AM
6 points
0 comments7 min readLW link

OpenAI Alums, No­bel Lau­re­ates Urge Reg­u­la­tors to Save Com­pany’s Non­profit Structure

garrisonApr 23, 2025, 11:01 PM
66 points
0 comments8 min readLW link
(garrisonlovely.substack.com)

What AI safety plans are there?

MichaelDickensApr 23, 2025, 10:58 PM
16 points
3 comments1 min readLW link

o3 Is a Ly­ing Liar

ZviApr 23, 2025, 8:00 PM
84 points
26 comments9 min readLW link
(thezvi.wordpress.com)

Put­ting up Bumpers

Sam BowmanApr 23, 2025, 4:05 PM
52 points
14 comments2 min readLW link

The AI Belief-Con­sis­tency Letter

Knight LeeApr 23, 2025, 12:01 PM
−6 points
15 comments4 min readLW link

Jaan Tal­linn’s 2024 Philan­thropy Overview

jaanApr 23, 2025, 11:06 AM
223 points
8 comments1 min readLW link
(jaan.info)

[Question] Are we “be­ing poi­soned”?

TigerlilyApr 23, 2025, 5:11 AM
16 points
2 comments2 min readLW link

To Un­der­stand His­tory, Keep Former Pop­u­la­tion Distri­bu­tions In Mind

Arjun PanicksseryApr 23, 2025, 4:51 AM
240 points
13 comments2 min readLW link
(arjunpanickssery.substack.com)

Fish and Faces

EggsApr 23, 2025, 3:35 AM
8 points
6 comments2 min readLW link

Is al­ign­ment re­ducible to be­com­ing more co­her­ent?

Cole WyethApr 22, 2025, 11:47 PM
19 points
0 comments3 min readLW link

The EU Is Ask­ing for Feed­back on Fron­tier AI Reg­u­la­tion (Open to Global Ex­perts)—This Post Breaks Down What’s at Stake for AI Safety

Katalina HernandezApr 22, 2025, 8:39 PM
60 points
13 comments9 min readLW link

Cor­rupted by Rea­son­ing: Rea­son­ing Lan­guage Models Be­come Free-Riders in Public Goods Games

Apr 22, 2025, 7:25 PM
24 points
3 comments5 min readLW link

Align­ment from equiv­ar­i­ance II—lan­guage equiv­ar­i­ance as a way of figur­ing out what an AI “means”

hamishtodd1Apr 22, 2025, 7:04 PM
5 points
0 comments3 min readLW link

There is no Red Line

TachikomaApr 22, 2025, 6:28 PM
−13 points
1 comment3 min readLW link

Man­i­fund 2025 Regrants

Austin ChenApr 22, 2025, 5:36 PM
21 points
0 comments5 min readLW link
(manifund.substack.com)

AISN#52: An Ex­pert Virol­ogy Benchmark

Apr 22, 2025, 5:08 PM
6 points
0 comments4 min readLW link
(newsletter.safe.ai)

In­tu­ition in AI

Priyanka BharadwajApr 22, 2025, 3:15 PM
−1 points
2 comments2 min readLW link

Prob­lems with Bayesi­anism: A So­cratic Dialogue

B JacobsApr 22, 2025, 2:09 PM
3 points
1 comment14 min readLW link
(bobjacobs.substack.com)

So­cietal and tech­nolog­i­cal progress as sewing an ever-grow­ing, ever-chang­ing, patchy, and poly­chrome quilt

Apr 22, 2025, 1:21 PM
47 points
24 comments25 min readLW link

You Bet­ter Mechanize

ZviApr 22, 2025, 1:10 PM
74 points
6 comments20 min readLW link
(thezvi.wordpress.com)

Ex­per­i­men­tal test­ing: can I treat my­self as a ran­dom sam­ple?

avturchinApr 22, 2025, 12:34 PM
9 points
41 comments4 min readLW link

Fam­ily-line se­lec­tion optimizer

lemonhopeApr 22, 2025, 7:16 AM
2 points
0 comments1 min readLW link

Ac­countabil­ity Sinks

Martin SustrikApr 22, 2025, 5:00 AM
423 points
57 comments15 min readLW link
(250bpm.substack.com)

Most AI value will come from broad au­toma­tion, not from R&D

Matthew BarnettApr 22, 2025, 3:22 AM
10 points
6 comments2 min readLW link
(epoch.ai)

Es­ti­mat (8 Iden­tities)

P. JoãoApr 22, 2025, 2:42 AM
4 points
0 comments3 min readLW link

A Let­ter to His High­ness Louis XV, the King of France

testingthewatersApr 22, 2025, 12:51 AM
2 points
0 comments1 min readLW link
(aclevername.substack.com)

10 Prin­ci­ples for Real Align­ment

AdriaanApr 21, 2025, 10:18 PM
−7 points
0 comments7 min readLW link

AE Stu­dio is hiring!

Trent HodgesonApr 21, 2025, 8:35 PM
20 points
2 comments2 min readLW link

$500 Bounty Prob­lem: Are (Ap­prox­i­mately) Deter­minis­tic Nat­u­ral La­tents All You Need?

Apr 21, 2025, 8:19 PM
92 points
24 comments3 min readLW link

More Than Just A, T, C, and G: Screen­ing for Hid­den Dangers in DNA Sequences

sgdApr 21, 2025, 8:12 PM
1 point
0 comments11 min readLW link