[Question] How could I tell some­one that con­scious­ness is not the pri­mary con­cern of AI Safety?

Lysandre Terrisse13 Jun 2025 22:44 UTC
11 points
2 comments3 min readLW link

De­bate ex­per­i­ments at The Curve, LessOn­line and Manifest

Nathan Young13 Jun 2025 22:35 UTC
36 points
12 comments5 min readLW link
(nathanpmyoung.substack.com)

Futarchy’s fun­da­men­tal flaw

dynomight13 Jun 2025 22:08 UTC
185 points
49 comments9 min readLW link
(dynomight.net)

The Pros and Cons of Be­ing Among Your Tribe

Sable13 Jun 2025 21:41 UTC
39 points
0 comments7 min readLW link
(affablyevil.substack.com)

Con­strain­ing Minds, Not Goals: A Struc­tural Ap­proach to AI Alignment

Johannes C. Mayer13 Jun 2025 21:06 UTC
25 points
0 comments9 min readLW link

The op­ti­mal level of op­ti­miza­tion is suboptimal

ellifournier13 Jun 2025 18:06 UTC
4 points
4 comments1 min readLW link
(ellifournier.substack.com)

On Prun­ing an Over­grown Garden

Vaatzes13 Jun 2025 17:54 UTC
3 points
3 comments6 min readLW link

Learned hel­pless­ness about “teach­ing to the test”

Viliam13 Jun 2025 17:53 UTC
36 points
16 comments3 min readLW link

In­for­ma­tion-Dense Con­fer­ence Badges

ozziegooen13 Jun 2025 17:52 UTC
28 points
4 comments4 min readLW link
(ozziegooen.substack.com)

The Su­per­wis­dom Th­e­sis: Why Su­per­in­tel­li­gence Does Not Pose An Ex­is­ten­tial Threat

Max Abecassis13 Jun 2025 17:35 UTC
−23 points
9 comments30 min readLW link

The Boat Theft The­ory of Consciousness

Lorec13 Jun 2025 16:38 UTC
43 points
36 comments2 min readLW link

Monthly Roundup #31: June 2025

Zvi13 Jun 2025 16:20 UTC
37 points
3 comments50 min readLW link
(thezvi.wordpress.com)

Un­su­per­vised Elic­i­ta­tion of Lan­guage Models

13 Jun 2025 16:15 UTC
57 points
12 comments2 min readLW link

Lucky Omega Problem

Tapatakt13 Jun 2025 14:54 UTC
10 points
4 comments4 min readLW link

Distil­la­tion Ro­bus­tifies Unlearning

13 Jun 2025 13:45 UTC
239 points
43 comments8 min readLW link
(arxiv.org)

Self-Adapt­ing Lan­guage Models (from MIT, arXiv preprint)

Person13 Jun 2025 13:08 UTC
5 points
1 comment1 min readLW link

Do Not Tile the Light­cone with Your Con­fused Ontology

Jan_Kulveit13 Jun 2025 12:45 UTC
236 points
27 comments5 min readLW link
(boundedlyrational.substack.com)

Cor­po­ra­tions as Paper­clip/​Profit Maximizers

busssard13 Jun 2025 10:55 UTC
17 points
3 comments22 min readLW link

4. Why ex­ist­ing ap­proaches to cause pri­ori­ti­za­tion are not ro­bust to unawareness

Anthony DiGiovanni13 Jun 2025 8:55 UTC
26 points
0 comments16 min readLW link

[Question] Un­der what con­di­tions should hu­mans stop pur­su­ing tech­ni­cal AI safety ca­reers?

S. Alex Bradt13 Jun 2025 5:56 UTC
6 points
0 comments1 min readLW link

[linkpost] AI Align­ment is About Cul­ture, Not Con­trol by JCorvinus

Milan W13 Jun 2025 0:07 UTC
1 point
8 comments1 min readLW link
(jcorvinus.medium.com)

Fore­cast AI 2027

ChristianWilliams12 Jun 2025 21:12 UTC
20 points
0 comments1 min readLW link
(www.metaculus.com)

CRMArena-Pro: Holis­tic Assess­ment of LLM Agents Across Di­verse Busi­ness Sce­nar­ios and Interactions

Annapurna12 Jun 2025 19:53 UTC
8 points
0 comments1 min readLW link
(arxiv.org)

When does train­ing a model change its goals?

12 Jun 2025 18:43 UTC
79 points
3 comments15 min readLW link

Res­train­ing Fac­tors in AI Align­ment Sys­tems

theophilus tabuke12 Jun 2025 18:17 UTC
1 point
1 comment1 min readLW link

Anal­y­sis of Au­to­mated Prompt Eng­ineer­ing for Forecasting

ChristianWilliams12 Jun 2025 15:49 UTC
6 points
0 comments7 min readLW link
(www.metaculus.com)

AI #120: While o3 Turned Pro

Zvi12 Jun 2025 15:30 UTC
51 points
3 comments53 min readLW link
(thezvi.wordpress.com)

Towards mu­tu­ally as­sured cooperation

mikko12 Jun 2025 15:15 UTC
5 points
0 comments1 min readLW link

What If We Could Mon­i­tor Hu­man In­tent?

Saif Khan12 Jun 2025 8:51 UTC
−8 points
6 comments3 min readLW link

The Way of a Skeptic

Martin Sustrik12 Jun 2025 5:40 UTC
38 points
2 comments6 min readLW link
(www.250bpm.com)

[Question] When should you read a bi­og­ra­phy?

CstineSublime12 Jun 2025 5:19 UTC
3 points
6 comments3 min readLW link

An Easily Over­looked Post on the Au­toma­tion of Wis­dom and Philosophy

Chris_Leong12 Jun 2025 2:54 UTC
19 points
0 comments1 min readLW link
(blog.aiimpacts.org)

Maybe So­cial Anx­iety Is Just You Failing At Mind Control

25Hour11 Jun 2025 23:49 UTC
84 points
21 comments16 min readLW link

OpenAI now has an RL API which is broadly accessible

ryan_greenblatt11 Jun 2025 23:39 UTC
44 points
1 comment5 min readLW link

So You Want to Work at a Fron­tier AI Lab

Joe Rogero11 Jun 2025 23:11 UTC
54 points
14 comments7 min readLW link
(intelligence.org)

Com­men­tary On The Tur­ing Apocrypha

jdp11 Jun 2025 22:52 UTC
25 points
0 comments11 min readLW link
(minihf.com)

[Question] My friend wants a good book recom­men­da­tion to un­der­stand AI, AI safety, and the field, and prob­a­bly the drama. He’s smart but non-tech­ni­cal and not keep­ing up with trends. Any recs?

JohnGreer11 Jun 2025 22:32 UTC
9 points
0 comments1 min readLW link

The Dun­ning-Dun­ning-Kruger-Kruger Effect

ellifournier11 Jun 2025 21:02 UTC
−1 points
2 comments1 min readLW link
(ellifournier.substack.com)

A Re­vi­sion to Mar­ket Mone­tarism: In­di­vi­d­ual Hoard­ing as Ra­tional, Com­pe­ti­tion for Dol­lars as Zero-Sum?

Lorec11 Jun 2025 20:13 UTC
4 points
0 comments4 min readLW link

In­ves­ti­gat­ing Ac­ci­den­tal Misal­ign­ment: Causal Effects of Fine-Tun­ing Data on Model Vulnerability

11 Jun 2025 19:30 UTC
6 points
0 comments5 min readLW link

The Dream of a Gen­tle Singularity

Zvi11 Jun 2025 19:30 UTC
57 points
7 comments12 min readLW link
(thezvi.wordpress.com)

Be­ware Gen­eral Claims about “Gen­er­al­iz­able Rea­son­ing Ca­pa­bil­ities” (of Modern AI Sys­tems)

LawrenceC11 Jun 2025 19:27 UTC
317 points
19 comments16 min readLW link

Reli­gion for Rationalists

Gordon Seidoh Worley11 Jun 2025 19:05 UTC
27 points
65 comments4 min readLW link

Difficul­ties of Escha­tolog­i­cal policy mak­ing [Linkpost]

Noosphere8911 Jun 2025 14:12 UTC
11 points
3 comments3 min readLW link
(jack-clark.net)

Hydra

Matrice Jacobine11 Jun 2025 14:07 UTC
24 points
0 comments1 min readLW link
(philosophybear.substack.com)

SafeRLHub: An In­ter­ac­tive Re­source for RL Safety and Interpretability

11 Jun 2025 5:47 UTC
11 points
0 comments7 min readLW link

More on policy ar­gu­ments and the AB problem

Sniffnoy11 Jun 2025 4:42 UTC
10 points
0 comments4 min readLW link

Us­ing AI Video Gen­er­a­tion to Re-cre­ate Memories

Annapurna11 Jun 2025 4:06 UTC
−1 points
2 comments1 min readLW link

Con­flicted on AI Politics

jefftk11 Jun 2025 3:40 UTC
27 points
5 comments2 min readLW link
(www.jefftk.com)

the void

nostalgebraist11 Jun 2025 3:19 UTC
424 points
108 comments1 min readLW link
(nostalgebraist.tumblr.com)