Defin­ing Mon­i­torable and Use­ful Goals

Rubi J. Hudson15 Jul 2025 23:06 UTC
11 points
0 comments16 min readLW link

[Question] Do you have any recom­men­da­tions for read­ings on global risk fore­cast­ing and anal­y­sis ap­plied to pub­lic policy de­sign on a slightly smaller scale, or for more spe­cific ob­jec­tives?

Ana Lopez15 Jul 2025 22:00 UTC
1 point
0 comments1 min readLW link

1 week fast on livestream for AI xrisk

samuelshadrach15 Jul 2025 21:36 UTC
1 point
2 comments1 min readLW link

AISN #59: EU Pub­lishes Gen­eral-Pur­pose AI Code of Practice

15 Jul 2025 18:59 UTC
10 points
0 comments4 min readLW link
(aisafety.substack.com)

Prin­ci­ples for Pick­ing Prac­ti­cal In­ter­pretabil­ity Projects

Sam Marks15 Jul 2025 17:38 UTC
27 points
0 comments13 min readLW link

Chain of Thought Mon­i­tora­bil­ity: A New and Frag­ile Op­por­tu­nity for AI Safety

15 Jul 2025 16:23 UTC
166 points
32 comments1 min readLW link
(bit.ly)

The Virtue of Fear and the Myth of “Fear­less­ness”

David_Veksler15 Jul 2025 16:10 UTC
7 points
3 comments1 min readLW link

Grok 4 Var­i­ous Things

Zvi15 Jul 2025 15:50 UTC
50 points
4 comments32 min readLW link
(thezvi.wordpress.com)

Value sys­tems of the fron­tier AIs, re­duced to slogans

Mitchell_Porter15 Jul 2025 15:10 UTC
4 points
0 comments1 min readLW link

What is David Chap­man talk­ing about when he talks about “mean­ing” in his book “Mean­ing­ness”?

SpectrumDT15 Jul 2025 14:29 UTC
22 points
15 comments2 min readLW link

Why Elimi­nat­ing De­cep­tion Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC
19 points
6 comments4 min readLW link

Gen­er­al­iz­ing zom­bie arguments

jessicata15 Jul 2025 5:09 UTC
23 points
9 comments7 min readLW link
(unstableontology.com)

Do con­fi­dent short timelines make sense?

15 Jul 2025 3:37 UTC
138 points
76 comments69 min readLW link

Critic Con­tri­bu­tions Are Log­i­cally Irrelevant

Zack_M_Davis15 Jul 2025 1:03 UTC
27 points
74 comments6 min readLW link

AISafety.com Hackathon 2025

Bryce Robertson15 Jul 2025 0:04 UTC
12 points
0 comments1 min readLW link

Don’t Say “I Want to Work In AI Policy”

henryj14 Jul 2025 23:19 UTC
5 points
0 comments2 min readLW link
(www.henryjosephson.com)

Re­cent Red­wood Re­search pro­ject proposals

14 Jul 2025 22:27 UTC
91 points
0 comments3 min readLW link

The Role of Re­spect: Why we in­evitably ap­peal to authority

jimmy14 Jul 2025 21:28 UTC
18 points
2 comments12 min readLW link

Mak­ing Sense of Con­scious­ness Part 3: The Pul­v­inar Nucleus

sarahconstantin14 Jul 2025 21:20 UTC
14 points
0 comments10 min readLW link
(sarahconstantin.substack.com)

LLM-in­duced craz­i­ness and base rates

Kaj_Sotala14 Jul 2025 21:16 UTC
70 points
2 comments2 min readLW link
(andymasley.substack.com)

Nar­row Misal­ign­ment is Hard, Emer­gent Misal­ign­ment is Easy

14 Jul 2025 21:05 UTC
130 points
23 comments5 min readLW link

What do you Want out of Liter­a­ture Re­views?

Elizabeth14 Jul 2025 20:20 UTC
25 points
4 comments4 min readLW link
(acesounderglass.com)

The Three Ide­olog­i­cal Stances

14 Jul 2025 20:14 UTC
2 points
0 comments3 min readLW link
(cognition.cafe)

Vi­su­al­iz­ing AI Align­ment – CFP for AGI-2025 Work­shop (Aug 10, Live + Vir­tual)

CC4CI14 Jul 2025 20:12 UTC
9 points
0 comments4 min readLW link

[Question] Is the poli­ti­cal right be­com­ing ac­tively, ex­plic­itly an­ti­semitic?

lc14 Jul 2025 18:57 UTC
28 points
16 comments1 min readLW link

Weird Fea­tures in Protein LLMs: The Gram Lens

Jude Stiel14 Jul 2025 17:32 UTC
8 points
0 comments9 min readLW link

METR: How Does Time Hori­zon Vary Across Do­mains?

14 Jul 2025 16:13 UTC
84 points
8 comments14 min readLW link
(metr.org)

Worse Than MechaHitler

Zvi14 Jul 2025 16:00 UTC
53 points
1 comment22 min readLW link
(thezvi.wordpress.com)

How To Cause Less Suffer­ing While Eat­ing An­i­mals

Bentham's Bulldog14 Jul 2025 15:59 UTC
11 points
3 comments4 min readLW link

Self-preser­va­tion or In­struc­tion Am­bi­guity? Ex­am­in­ing the Causes of Shut­down Resistance

14 Jul 2025 14:52 UTC
67 points
18 comments11 min readLW link

Bernie San­ders (I-VT) men­tions AI loss of con­trol risk in Giz­modo interview

Matrice Jacobine14 Jul 2025 14:47 UTC
42 points
2 comments1 min readLW link
(gizmodo.com)

Ar­row the­o­rem is an ar­ti­fact of or­di­nal preferences

Arturo Macias14 Jul 2025 14:08 UTC
7 points
4 comments4 min readLW link

Shanz­son AI 2027 Timeline

shanzson14 Jul 2025 10:21 UTC
13 points
11 comments8 min readLW link
(mirror.xyz)

Lead, Own, Share: Sovereign Wealth Funds for Trans­for­ma­tive AI

Matrice Jacobine14 Jul 2025 9:34 UTC
8 points
0 comments1 min readLW link
(www.convergenceanalysis.org)

De­liber­a­tive Credit As­sign­ment: Mak­ing Faith­ful Rea­son­ing Profitable

Florian_Dietz14 Jul 2025 9:26 UTC
9 points
3 comments17 min readLW link

The His­tory of FSRS for Anki

L.M.Sherlock14 Jul 2025 8:11 UTC
26 points
0 comments14 min readLW link
(l-m-sherlock.notion.site)

Don’t fight your LLM, redi­rect it!

Yair Halberstadt14 Jul 2025 6:50 UTC
19 points
2 comments1 min readLW link

Ac­tion­able Moder­a­tion Pro­pos­als from com­ments tree

ProgramCrafter14 Jul 2025 6:41 UTC
6 points
0 comments2 min readLW link

Aspiring to Great Sols­tice Speeches: Mostly-Ob­vi­ous Advice

Czynski14 Jul 2025 2:29 UTC
9 points
5 comments14 min readLW link

Why are effect sizes so small?

Jacob Goldsmith14 Jul 2025 1:17 UTC
1 point
0 comments4 min readLW link

Liv Bo­eree—non-zero hero

James Stephen Brown13 Jul 2025 23:49 UTC
1 point
0 comments2 min readLW link
(nonzerosum.games)

Moloch’s Demise—solv­ing the origi­nal problem

James Stephen Brown13 Jul 2025 23:29 UTC
9 points
8 comments1 min readLW link
(nonzerosum.games)

4 Ways Moloch is Ruin­ing Your Life!—a lis­ti­cle that shows Moloch is all around us, even in listicles

James Stephen Brown13 Jul 2025 23:27 UTC
5 points
0 comments2 min readLW link
(nonzerosum.games)

Three Miss­ing Cakes, or One Tur­bu­lent Critic?

Benquo13 Jul 2025 23:08 UTC
31 points
21 comments3 min readLW link

O(1) rea­son­ing in la­tent space: 1ms in­fer­ence, 77% ac­cu­racy, no at­ten­tion or tokens

Founder Order One13 Jul 2025 22:54 UTC
−11 points
9 comments2 min readLW link

On ac­tu­ally tak­ing ex­pres­sions liter­ally: ten­sion as the key to med­i­ta­tion?

Chris_Leong13 Jul 2025 22:49 UTC
16 points
12 comments5 min readLW link

[Question] Why is LW not about win­ning?

azergante13 Jul 2025 22:36 UTC
21 points
21 comments1 min readLW link

LLMs are stuck in Plato’s cave

Sean Herrington13 Jul 2025 20:37 UTC
7 points
3 comments6 min readLW link

Do LLMs know what they’re ca­pa­ble of? Why this mat­ters for AI safety, and ini­tial findings

13 Jul 2025 19:54 UTC
51 points
5 comments18 min readLW link

10x more train­ing com­pute = 5x greater task length (kind of)

Expertium13 Jul 2025 18:40 UTC
48 points
8 comments2 min readLW link