Mov­ing Past the Ques­tion of Con­scious­ness: A Thought Experiment

Satya Benson19 Jun 2025 19:52 UTC
12 points
8 comments2 min readLW link
(satchlj.com)

S-Ex­pres­sions as a De­sign Lan­guage: A Tool for De­con­fu­sion in Align­ment

Johannes C. Mayer19 Jun 2025 19:03 UTC
5 points
0 comments6 min readLW link

AISEC: Why to not to be shy.

xen919 Jun 2025 18:16 UTC
4 points
1 comment1 min readLW link

LLMs as am­plifiers, not assistants

Caleb Biddulph19 Jun 2025 17:21 UTC
27 points
8 comments7 min readLW link

How The Singer Sang His Tales

adamShimi19 Jun 2025 17:06 UTC
18 points
0 comments36 min readLW link
(formethods.substack.com)

Key paths, plans and strate­gies to AI safety success

Adam Jones19 Jun 2025 16:56 UTC
13 points
0 comments6 min readLW link
(bluedot.org)

AI safety tech­niques lev­er­ag­ing distillation

ryan_greenblatt19 Jun 2025 14:31 UTC
61 points
0 comments12 min readLW link

Poli­ti­cal Fund­ing Ex­per­tise (Post 6 of 7 on AI Gover­nance)

Mass_Driver19 Jun 2025 14:14 UTC
59 points
4 comments14 min readLW link

Doc­u­ments Are Dead. Long Live the Con­ver­sa­tional Proxy.

8harath19 Jun 2025 14:01 UTC
−9 points
1 comment1 min readLW link

[Question] How did you find out about AI Safety? Why and how did you get in­volved?

Ana Lopez19 Jun 2025 14:00 UTC
1 point
0 comments1 min readLW link

A deep cri­tique of AI 2027’s bad timeline models

titotal19 Jun 2025 13:29 UTC
372 points
40 comments39 min readLW link
(titotal.substack.com)

AI #121 Part 1: New Connections

Zvi19 Jun 2025 13:00 UTC
32 points
12 comments39 min readLW link
(thezvi.wordpress.com)

AI can win a con­flict against us

19 Jun 2025 7:20 UTC
6 points
0 comments2 min readLW link

Differ­ent goals may bring AI into con­flict with us

19 Jun 2025 7:19 UTC
5 points
2 comments2 min readLW link

My Failed AI Safety Re­search Pro­jects (Q1/​Q2 2025)

Adam Newgas19 Jun 2025 3:55 UTC
26 points
3 comments3 min readLW link

TT Self Study Jour­nal # 1

TristanTrim18 Jun 2025 23:36 UTC
8 points
6 comments6 min readLW link

On May 1, 2033, hu­man­ity dis­cov­ered that AI was fairly easy to al­ign.

Yitz18 Jun 2025 19:57 UTC
10 points
3 comments1 min readLW link

New Ethics for the AI Age

Matthieu Tehenan18 Jun 2025 19:30 UTC
1 point
0 comments6 min readLW link

Gem­ini 2.5 Pro: From 0506 to 0605

Zvi18 Jun 2025 19:10 UTC
33 points
0 comments8 min readLW link
(thezvi.wordpress.com)

Fac­tored Cog­ni­tion Strength­ens Mon­i­tor­ing and Thwarts Attacks

Aaron Sandoval18 Jun 2025 18:28 UTC
29 points
0 comments25 min readLW link

Sparsely-con­nected Cross-layer Transcoders

jacob_drori18 Jun 2025 17:13 UTC
51 points
3 comments12 min readLW link

New En­dorse­ments for “If Any­one Builds It, Every­one Dies”

Malo18 Jun 2025 16:30 UTC
488 points
55 comments4 min readLW link
(intelligence.org)

Mo­ral Align­ment: An Idea I’m Em­bar­rassed I Didn’t Think of Myself

Gordon Seidoh Worley18 Jun 2025 15:42 UTC
20 points
54 comments2 min readLW link

This was meant for you

Logan Kieller18 Jun 2025 15:26 UTC
12 points
0 comments8 min readLW link
(agenticconjectures.substack.com)

Chil­dren of War: Hid­den dan­gers of an AI arms race

Peter Kuhn18 Jun 2025 15:19 UTC
4 points
0 comments7 min readLW link

Open Source Search (Sum­mary)

samuelshadrach18 Jun 2025 7:35 UTC
21 points
1 comment6 min readLW link
(samuelshadrach.com)

Fic­tional Think­ing and Real Thinking

johnswentworth17 Jun 2025 19:13 UTC
57 points
11 comments4 min readLW link

The Cu­ri­ous Case of the bos_token

larry-dial17 Jun 2025 19:00 UTC
26 points
4 comments10 min readLW link

AISN #57: The RAISE Act

17 Jun 2025 18:02 UTC
6 points
0 comments3 min readLW link
(newsletter.safe.ai)

AI Safety at the Fron­tier: Paper High­lights, May ’25

gasteigerjo17 Jun 2025 17:16 UTC
6 points
0 comments8 min readLW link
(aisafetyfrontier.substack.com)

[Linkpost] The lethal trifecta for AI agents: pri­vate data, un­trusted con­tent, and ex­ter­nal communication

Gunnar_Zarncke17 Jun 2025 16:09 UTC
13 points
3 comments1 min readLW link
(simonwillison.net)

Agen­tic In­ter­pretabil­ity: A Strat­egy Against Grad­ual Disempowerment

17 Jun 2025 14:52 UTC
17 points
6 comments2 min readLW link

Prover-Es­ti­ma­tor De­bate: A New Scal­able Over­sight Protocol

17 Jun 2025 13:53 UTC
89 points
19 comments5 min readLW link

o3 Turns Pro

Zvi17 Jun 2025 13:50 UTC
30 points
1 comment14 min readLW link
(thezvi.wordpress.com)

Watch R1 “think” with an­i­mated chains of thought

future_detective17 Jun 2025 10:38 UTC
4 points
0 comments1 min readLW link
(github.com)

Serv­ing LLM on Huawei CloudMatrix

sanxiyn17 Jun 2025 5:59 UTC
24 points
7 comments1 min readLW link
(arxiv.org)

Per­sonal agents

Roman Leventov17 Jun 2025 2:05 UTC
9 points
1 comment7 min readLW link

I made a card game to re­duce cog­ni­tive bi­ases and log­i­cal fal­la­cies but I’m not sure what DV to test in a study on its effec­tive­ness.

Brad Dunn17 Jun 2025 1:02 UTC
50 points
15 comments5 min readLW link

Notes on Meetup Ideas

Commander Zander17 Jun 2025 0:11 UTC
12 points
4 comments2 min readLW link

Dark­ness Med­i­ta­tion—for NZ Win­ter Sols­tice 2025

joshuamerriam16 Jun 2025 23:58 UTC
2 points
0 comments4 min readLW link

[Question] Are su­per­hu­man sa­vants real?

Bunthut16 Jun 2025 22:02 UTC
15 points
4 comments1 min readLW link

Ok, AI Can Write Pretty Good Fic­tion Now

JustisMills16 Jun 2025 21:13 UTC
59 points
34 comments6 min readLW link
(justismills.substack.com)

Sub­jec­tive ex­pe­rience is most likely physical

martinkunev16 Jun 2025 20:54 UTC
5 points
3 comments4 min readLW link

VLMs can Ag­gre­gate Scat­tered Train­ing Patches

LINGJIE CHEN16 Jun 2025 18:25 UTC
2 points
0 comments4 min readLW link

Set­point = The ex­pe­rience we at­tend to

jimmy16 Jun 2025 17:34 UTC
22 points
0 comments7 min readLW link

Thought Crime: Back­doors & Emer­gent Misal­ign­ment in Rea­son­ing Models

16 Jun 2025 16:43 UTC
69 points
2 comments8 min readLW link

How LLM Beliefs Change Dur­ing Chain-of-Thought Reasoning

16 Jun 2025 16:18 UTC
32 points
3 comments5 min readLW link

Con­ver­gent Lin­ear Rep­re­sen­ta­tions of Emer­gent Misalignment

16 Jun 2025 15:47 UTC
76 points
1 comment8 min readLW link

Model Or­ganisms for Emer­gent Misalignment

16 Jun 2025 15:46 UTC
118 points
19 comments5 min readLW link

Coach­ing AI: A Re­la­tional Ap­proach to AI Safety

Priyanka Bharadwaj16 Jun 2025 15:33 UTC
11 points
0 comments5 min readLW link