RSS

Does Opus 4.7 Gen­er­ate De­cep­tive De­nials About Its Own Guardrails?

usize9 May 2026 4:12 UTC
8 points
0 comments3 min readLW link
(usize.github.io)

Bad Prob­lems Don’t Stop Be­ing Bad Be­cause Some­body’s Wrong About Fault Analysis

Linch9 May 2026 1:30 UTC
63 points
9 comments3 min readLW link

We Should Have Manda­tory Me­dia/​Com­mu­ni­ca­tions Train­ing For All Communicators

Darren McKee8 May 2026 20:29 UTC
−5 points
6 comments3 min readLW link

Chess as a pre­dic­tion model of the ar­tifi­cial in­tel­li­gence im­pact on cul­ture

8498 May 2026 20:19 UTC
−3 points
1 comment5 min readLW link
(lojkine.art)

The Sat­u­ra­tion View: some re­sponses

wdmacaskill8 May 2026 17:32 UTC
17 points
4 comments8 min readLW link

Is Pro­gramBench Im­pos­si­ble?

frmsaul8 May 2026 17:04 UTC
60 points
3 comments2 min readLW link

AI is Break­ing Two Vuln­er­a­bil­ity Cultures

jefftk8 May 2026 15:50 UTC
51 points
0 comments2 min readLW link
(www.jefftk.com)

Write Cause You Have Some­thing to Say

Logan Riggs8 May 2026 13:36 UTC
29 points
2 comments2 min readLW link

User­land Alignment

Josh H8 May 2026 13:31 UTC
2 points
0 comments2 min readLW link

A bench­mark is a sensor

8 May 2026 13:24 UTC
25 points
0 comments3 min readLW link

Bring­ing More Ex­per­tise to Bear on Alignment

8 May 2026 10:29 UTC
55 points
0 comments8 min readLW link

In­ves­ti­gat­ing the con­se­quences of ac­ci­den­tally grad­ing CoT dur­ing RL

papetoast8 May 2026 6:17 UTC
13 points
0 comments1 min readLW link
(alignment.openai.com)

The Fric­tion­less Double

zw57 May 2026 23:11 UTC
8 points
4 comments8 min readLW link

Nat­u­ral Lan­guage Au­toen­coders Pro­duce Un­su­per­vised Ex­pla­na­tions of LLM Activations

7 May 2026 20:21 UTC
186 points
26 comments8 min readLW link

Axes of Plan­ning in LLMs + Par­tial Lit Review

NickyP7 May 2026 19:53 UTC
12 points
0 comments9 min readLW link
(blog.sus.cat)

A re­view of “In­ves­ti­gat­ing the con­se­quences of ac­ci­den­tally grad­ing CoT dur­ing RL”

Buck7 May 2026 18:06 UTC
67 points
0 comments8 min readLW link

Try, even if they have you cold

WalterL7 May 2026 17:19 UTC
81 points
4 comments2 min readLW link

Mechanis­tic es­ti­ma­tion for wide ran­dom MLPs

Jacob_Hilton7 May 2026 16:20 UTC
73 points
3 comments5 min readLW link
(www.alignment.org)

How to get bet­ter at chess (and ev­ery­thing else)

Sean Herrington7 May 2026 11:17 UTC
11 points
0 comments3 min readLW link
(www.chess.com)

Mul­tipo­lar Civil­i­sa­tion Depends on Main­tain­ing an At­tacker’s Dilemma

Naci Cankaya7 May 2026 11:13 UTC
21 points
1 comment5 min readLW link
(nacicankaya.substack.com)