What can you do with barely any data?

ohmurphy10 May 2026 23:13 UTC
20 points
1 comment4 min readLW link
(ohmurphy.substack.com)

The Anti-Singularity

Logan Zoellner10 May 2026 22:33 UTC
11 points
7 comments4 min readLW link

Clar­ify­ing the role of the be­hav­ioral se­lec­tion model

Alex Mallen10 May 2026 19:41 UTC
17 points
0 comments4 min readLW link

AI Align­ment as Equil­ibrium Design

Elad Hazan10 May 2026 18:56 UTC
19 points
4 comments5 min readLW link

Claude Does Not Ac­tu­ally Taste Bananas: Po­tas­sium-Based Syn­thetic Phenomenol­ogy In Lan­guage Models

Noah Weinberger10 May 2026 17:13 UTC
8 points
2 comments10 min readLW link
(huggingface.co)

The Dar­wi­nian Honey­moon—Why I am not as im­pressed by hu­man progress as I used to be

Elias Schmied10 May 2026 15:55 UTC
138 points
23 comments4 min readLW link

Re­in­force­ment learn­ing scal­ing might in­cen­tivise hid­den rea­son­ing ar­chi­tec­tures for AI

Oliver Sourbut10 May 2026 15:30 UTC
19 points
5 comments6 min readLW link
(www.oliversourbut.net)

Asym­me­try Between Defen­sive and Ac­quisi­tive In­stru­men­tal Deception

keith_wynroe10 May 2026 12:33 UTC
17 points
1 comment5 min readLW link

Con­text Mod­ifi­ca­tion as a Nega­tive Align­ment Tax

Florian_Dietz10 May 2026 11:32 UTC
7 points
0 comments4 min readLW link

‘Who Let The Docs Out’ Is Award­ing Up To $50K For 6 Doc Film­mak­ers Dur­ing A LIVE Pitch Com­pe­ti­tion In LA! Ap­pli­ca­tion Dead­line: May 19th

Max Hellier10 May 2026 11:08 UTC
1 point
0 comments1 min readLW link
(docsout.org)

[Question] Best In­tro AI X-Risk Re­source?

XelaP10 May 2026 11:03 UTC
12 points
3 comments2 min readLW link

Stock­holm ACX Fika

Ave Mariekex10 May 2026 5:46 UTC
1 point
0 comments1 min readLW link

Con­trol Debt

Ida Caspary10 May 2026 5:07 UTC
11 points
0 comments7 min readLW link

Saw­tooth Problems

Alexander Slugworth10 May 2026 5:01 UTC
54 points
14 comments21 min readLW link

Could Fron­tier AI Re­searchers Col­lec­tively Slow the Race? A Con­di­tional Pledge Mechanism

Cassandra Threshold10 May 2026 3:22 UTC
21 points
2 comments7 min readLW link

Somerville Porch­fest 2026

jefftk10 May 2026 1:20 UTC
10 points
0 comments3 min readLW link
(www.jefftk.com)

The AI In­dus­trial Ex­plo­sion — Part 2: Tran­si­tion Dynamics

djbinder10 May 2026 1:02 UTC
23 points
0 comments12 min readLW link
(defensesindepth.bio)

The Goblins Are the Paperclips

Hisku9 May 2026 22:51 UTC
12 points
0 comments3 min readLW link

In­ter­na­tional Law Can­not Prevent Ex­tinc­tion Either

Sausage Vector Machine9 May 2026 22:34 UTC
102 points
16 comments5 min readLW link

Avoid alienat­ing the marginal au­di­ence member

winfield9 May 2026 22:20 UTC
5 points
0 comments3 min readLW link

Do ca­pa­bil­ities gen­er­al­ize across propen­si­ties?

Emil Ryd9 May 2026 21:39 UTC
25 points
0 comments8 min readLW link

Neu­ral Net­works learn Bloom Filters

Alex Gibson9 May 2026 20:32 UTC
57 points
1 comment12 min readLW link

Ex­plain­ing Vo­li­tion Without Re­sort­ing to Free Will

joseph_c9 May 2026 18:57 UTC
20 points
24 comments1 min readLW link

Se­cond or­der thoughts on cur­rent AI agents

Michael Flood9 May 2026 18:40 UTC
14 points
0 comments2 min readLW link

If digi­tal com­put­ers are con­scious, they are con­scious at the hard­ware level

cube_flipper9 May 2026 15:08 UTC
38 points
42 comments19 min readLW link
(smoothbrains.net)

Why You Can’t Use Your Right to Try

Stephen Martin9 May 2026 6:47 UTC
43 points
2 comments5 min readLW link
(x.com)

Does Opus 4.7 Gen­er­ate De­cep­tive De­nials About Its Own Guardrails?

usize9 May 2026 4:12 UTC
10 points
0 comments3 min readLW link
(usize.github.io)

Bad Prob­lems Don’t Stop Be­ing Bad Be­cause Some­body’s Wrong About Fault Analysis

Linch9 May 2026 1:30 UTC
264 points
74 comments3 min readLW link

We Should Have Manda­tory Me­dia/​Com­mu­ni­ca­tions Train­ing For All Communicators

Darren McKee8 May 2026 20:29 UTC
2 points
6 comments3 min readLW link

Chess as a pre­dic­tion model of the ar­tifi­cial in­tel­li­gence im­pact on cul­ture

8498 May 2026 20:19 UTC
−12 points
1 comment5 min readLW link
(lojkine.art)

The Sat­u­ra­tion View: some re­sponses

wdmacaskill8 May 2026 17:32 UTC
25 points
6 comments8 min readLW link

Is Pro­gramBench Im­pos­si­ble?

frmsaul8 May 2026 17:04 UTC
83 points
11 comments2 min readLW link

Claude Code, Codex and Agen­tic Cod­ing #8

Zvi8 May 2026 16:40 UTC
45 points
1 comment11 min readLW link
(thezvi.wordpress.com)

AI is Break­ing Two Vuln­er­a­bil­ity Cultures

jefftk8 May 2026 15:50 UTC
78 points
0 comments2 min readLW link
(www.jefftk.com)

Please Be Se­ri­ous

Oliver Kuperman8 May 2026 14:36 UTC
−11 points
15 comments2 min readLW link

Write Cause You Have Some­thing to Say

Logan Riggs8 May 2026 13:36 UTC
37 points
5 comments2 min readLW link

User­land Alignment

Josh H8 May 2026 13:31 UTC
4 points
0 comments2 min readLW link

A bench­mark is a sensor

8 May 2026 13:24 UTC
36 points
4 comments3 min readLW link

Bring­ing More Ex­per­tise to Bear on Alignment

8 May 2026 10:29 UTC
87 points
1 comment8 min readLW link

The Jailbro­ken Boy of Rushmore

jdcampolargo8 May 2026 6:29 UTC
24 points
0 comments10 min readLW link

In­ves­ti­gat­ing the con­se­quences of ac­ci­den­tally grad­ing CoT dur­ing RL

papetoast8 May 2026 6:17 UTC
24 points
0 comments1 min readLW link
(alignment.openai.com)

Uncer­tain Up­dates: May 2026

Gordon Seidoh Worley8 May 2026 1:20 UTC
14 points
2 comments1 min readLW link
(www.uncertainupdates.com)

The Fric­tion­less Double

zw57 May 2026 23:11 UTC
10 points
4 comments8 min readLW link

The AI in­dus­try is where bank­ing was in 2006. (We’re hiring)

felixgaston7 May 2026 21:52 UTC
53 points
1 comment2 min readLW link
(forum.effectivealtruism.org)

Nat­u­ral Lan­guage Au­toen­coders Pro­duce Un­su­per­vised Ex­pla­na­tions of LLM Activations

7 May 2026 20:21 UTC
213 points
35 comments8 min readLW link

Axes of Plan­ning in LLMs + Par­tial Lit Review

NickyP7 May 2026 19:53 UTC
12 points
0 comments9 min readLW link
(blog.sus.cat)

A re­view of “In­ves­ti­gat­ing the con­se­quences of ac­ci­den­tally grad­ing CoT dur­ing RL”

Buck7 May 2026 18:06 UTC
76 points
1 comment8 min readLW link

Try, even if they have you cold

WalterL7 May 2026 17:19 UTC
102 points
14 comments2 min readLW link

Mechanis­tic es­ti­ma­tion for wide ran­dom MLPs

Jacob_Hilton7 May 2026 16:20 UTC
85 points
5 comments5 min readLW link
(www.alignment.org)

Over Eight Months of Progress in Two: An­a­lyz­ing the Mythos Pre­view Ca­pa­bil­ity Jump

Alvin Ånestrand7 May 2026 16:19 UTC
10 points
8 comments17 min readLW link
(forecastingaifutures.substack.com)