Cen­tral­iza­tion begets stagnation

Algon30 Oct 2025 23:49 UTC
6 points
0 comments2 min readLW link

Sum­mary and Com­ments on An­thropic’s Pilot Sab­o­tage Risk Report

GradientDissenter30 Oct 2025 20:19 UTC
29 points
0 comments5 min readLW link

Crit­i­cal Fal­li­bil­ism and The­ory of Con­straints in One An­a­lyzed Paragraph

Elliot Temple30 Oct 2025 20:06 UTC
2 points
0 comments28 min readLW link

AI #140: Try­ing To Hold The Line

Zvi30 Oct 2025 18:30 UTC
26 points
1 comment52 min readLW link
(thezvi.wordpress.com)

An­thropic’s Pilot Sab­o­tage Risk Report

dmz30 Oct 2025 17:50 UTC
32 points
2 comments3 min readLW link
(alignment.anthropic.com)

AISLE dis­cov­ered three new OpenSSL vulnerabilities

Jan_Kulveit30 Oct 2025 16:32 UTC
63 points
7 comments1 min readLW link
(aisle.com)

Son­net 4.5′s eval gam­ing se­ri­ously un­der­mines al­ign­ment evals, and this seems caused by train­ing on al­ign­ment evals

30 Oct 2025 15:34 UTC
143 points
21 comments14 min readLW link

Steer­ing Eval­u­a­tion-Aware Models to Act Like They Are Deployed

30 Oct 2025 15:03 UTC
61 points
12 comments16 min readLW link

On The Con­ser­va­tion of Rights

Roman Maksimovich30 Oct 2025 13:48 UTC
−2 points
2 comments8 min readLW link

When “HDMI-1” Lies To You

Gunnar_Zarncke30 Oct 2025 12:23 UTC
18 points
0 comments1 min readLW link

[Question] Why there is still one in­stance of Eliezer Yud­kowsky?

RomanS30 Oct 2025 12:00 UTC
−9 points
8 comments1 min readLW link

In­ter­view on the Heng­shui Model High School

L.M.Sherlock30 Oct 2025 10:26 UTC
21 points
2 comments30 min readLW link
(lmsherlock.substack.com)

Tran­scen­den­tal Ar­gu­men­ta­tion and the Epistemics of Discourse

0xA30 Oct 2025 6:37 UTC
1 point
2 comments3 min readLW link

Emer­gent In­tro­spec­tive Aware­ness in Large Lan­guage Models

Drake Thomas30 Oct 2025 4:42 UTC
129 points
19 comments1 min readLW link
(transformer-circuits.pub)

In­tro­duc­ing Aeon­isk: an Open Source Game and Dataset with Graded Out­come Tiers of Coun­ter­fac­tual Reasoning

threeriversainexus30 Oct 2025 3:02 UTC
1 point
0 comments4 min readLW link

Im­pos­si­bleBench: Mea­sur­ing Re­ward Hack­ing in LLM Cod­ing Agents

Ziqian Zhong30 Oct 2025 2:52 UTC
60 points
5 comments3 min readLW link
(arxiv.org)

LLM Hal­lu­ci­na­tions: An In­ter­nal Tug of War

violazhong30 Oct 2025 1:21 UTC
9 points
0 comments3 min readLW link

Ge­nius is Not About Genius

Algon30 Oct 2025 0:00 UTC
14 points
1 comment2 min readLW link

Quotes on OpenAI’s timelines to au­to­mated re­search, safety re­search, and safety col­lab­o­ra­tions be­fore re­cur­sive self improvement

TheManxLoiner29 Oct 2025 21:47 UTC
15 points
0 comments3 min readLW link

An Opinionated Guide to Pri­vacy De­spite Authoritarianism

TurnTrout29 Oct 2025 20:32 UTC
179 points
27 comments4 min readLW link
(turntrout.com)

Un­sureism: The Ra­tional Ap­proach to Reli­gious Uncertainty

Taylor G. Lunt29 Oct 2025 19:45 UTC
−7 points
3 comments5 min readLW link

Why you shouldn’t eat meat if you hate fac­tory farming

ceselder29 Oct 2025 17:00 UTC
5 points
4 comments4 min readLW link

The End of OpenAI’s Non­profit Era

garrison29 Oct 2025 16:28 UTC
41 points
0 comments9 min readLW link
(www.obsolete.pub)

An in­tro to the Ten­sor Eco­nomics blog

harsimony29 Oct 2025 16:24 UTC
15 points
0 comments12 min readLW link
(splittinginfinity.substack.com)

Uncer­tain Up­dates: Oc­to­ber 2025

Gordon Seidoh Worley29 Oct 2025 16:10 UTC
3 points
0 comments1 min readLW link
(www.uncertainupdates.com)

AI Doomers Should Raise Hell

James_Miller29 Oct 2025 16:10 UTC
−2 points
9 comments6 min readLW link

AISN #65: Mea­sur­ing Au­toma­tion and Su­per­in­tel­li­gence Mo­ra­to­rium Letter

29 Oct 2025 16:05 UTC
5 points
0 comments3 min readLW link
(newsletter.safe.ai)

TBC Epi­sode with Max Harms—Red Heart and If Any­one Builds It, Every­one Dies

Steven K Zuber29 Oct 2025 15:49 UTC
13 points
0 comments1 min readLW link
(www.thebayesianconspiracy.com)

[Question] Thresh­olds for Pas­cal’s Mug­ging?

MattAlexander29 Oct 2025 14:54 UTC
22 points
12 comments8 min readLW link

Please Do Not Sell B30A Chips to China

Zvi29 Oct 2025 14:50 UTC
62 points
6 comments7 min readLW link
(thezvi.wordpress.com)

Why Civ­i­liza­tions Are Un­sta­ble (And What This Means for AI Align­ment)

Elias_Kunnas29 Oct 2025 12:27 UTC
10 points
6 comments5 min readLW link

What can we learn from par­ent-child-al­ign­ment for AI?

Karl von Wendt29 Oct 2025 8:02 UTC
16 points
4 comments3 min readLW link

Some data from LeelaPieceOdds

Jeremy Gillen29 Oct 2025 4:27 UTC
66 points
21 comments3 min readLW link

How Do We Eval­u­ate the Qual­ity of LLMs’ Math­e­mat­i­cal Re­sponses?

Miguel Angel29 Oct 2025 1:37 UTC
5 points
0 comments13 min readLW link

Vi­su­al­iz­ing a Plat­form for Live World Models

Kuil29 Oct 2025 1:24 UTC
16 points
0 comments14 min readLW link

[Question] Why Would we get In­ner Misal­ign­ment by De­fault?

Coil29 Oct 2025 1:23 UTC
3 points
0 comments2 min readLW link

A Very Sim­ple Model of AI Dealmaking

Cleo Nardo29 Oct 2025 0:33 UTC
18 points
0 comments9 min readLW link

Up­com­ing Work­shop on Post-AGI Eco­nomics, Cul­ture, and Governance

28 Oct 2025 21:55 UTC
37 points
1 comment2 min readLW link

AI Craz­i­ness Miti­ga­tion Efforts

Zvi28 Oct 2025 19:00 UTC
37 points
5 comments11 min readLW link
(thezvi.wordpress.com)

When Will AI Trans­form the Econ­omy?

Andre.Infante28 Oct 2025 18:55 UTC
60 points
2 comments8 min readLW link

In­tro­duc­ing the Epoch Ca­pa­bil­ities In­dex (ECI)

28 Oct 2025 18:23 UTC
65 points
9 comments1 min readLW link
(epoch.ai)

Mottes and Baileys in AI discourse

Raemon28 Oct 2025 17:50 UTC
51 points
9 comments9 min readLW link

Tem­porar­ily Los­ing My Ego

Logan Riggs28 Oct 2025 16:41 UTC
21 points
4 comments3 min readLW link

The Memet­ics of AI Successionism

Jan_Kulveit28 Oct 2025 15:04 UTC
212 points
54 comments9 min readLW link

New 80,000 Hours prob­lem pro­file on the risks of power-seek­ing AI

Zershaaneh Qureshi28 Oct 2025 14:37 UTC
7 points
0 comments2 min readLW link

LLM robots can’t pass but­ter (and they are hav­ing an ex­is­ten­tial crisis about it)

Lukas Petersson28 Oct 2025 14:14 UTC
105 points
7 comments4 min readLW link

Call for men­tors from AI Safety and academia. Sci.STEPS men­tor­ship program

Valentin202628 Oct 2025 13:41 UTC
7 points
0 comments2 min readLW link

Heuris­tics for as­sess­ing how much of a bub­ble AI is in/​will be

Remmelt28 Oct 2025 8:08 UTC
8 points
2 comments2 min readLW link
(www.wired.com)

Q2 AI Bench­mark Re­sults: Pros Main­tain Clear Lead

28 Oct 2025 5:40 UTC
14 points
0 comments24 min readLW link
(www.metaculus.com)

A Sketch of Helpful­ness The­ory With Equiv­o­cal Principals

Lorxus28 Oct 2025 4:11 UTC
7 points
1 comment6 min readLW link
(tiled-with-pentagons.blogspot.com)