Why do Mechanis­tic In­ter­pretabil­ity?

Prudhviraj Naidu17 Jul 2025 23:21 UTC
2 points
0 comments5 min readLW link

Ke­tamine Part 1: Dosing

Elizabeth17 Jul 2025 20:10 UTC
25 points
0 comments7 min readLW link
(acesounderglass.com)

Aure­lius: A Peer-to-Peer Align­ment Protocol

Austin McCaffrey17 Jul 2025 19:13 UTC
3 points
4 comments1 min readLW link
(github.com)

Self-Con­trol is now an Eng­ineer­ing Problem

Josh Mitchell17 Jul 2025 18:13 UTC
−11 points
4 comments5 min readLW link

Video and tran­script of talk on “Can good­ness com­pete?”

Joe Carlsmith17 Jul 2025 17:54 UTC
98 points
19 comments34 min readLW link
(joecarlsmith.substack.com)

Are agent-ac­tion-de­pen­dent be­liefs un­der­de­ter­mined by ex­ter­nal re­al­ity?

Said Achmiz17 Jul 2025 14:33 UTC
21 points
16 comments6 min readLW link

AI #125: Smooth Criminal

Zvi17 Jul 2025 14:30 UTC
33 points
0 comments56 min readLW link
(thezvi.wordpress.com)

AI Offense Defense Balance in a Mul­tipo­lar World

17 Jul 2025 9:34 UTC
15 points
5 comments18 min readLW link
(www.existentialriskobservatory.org)

Biweekly AI Safety Comms Meetup

Vishakha17 Jul 2025 7:50 UTC
5 points
0 comments1 min readLW link

Do you care about your clone?

Harry Partridge17 Jul 2025 6:06 UTC
8 points
7 comments2 min readLW link

Com­ment on “Four Lay­ers of In­tel­lec­tual Con­ver­sa­tion”

Zack_M_Davis17 Jul 2025 3:53 UTC
64 points
11 comments5 min readLW link

Towards plau­si­ble moral naturalism

jessicata17 Jul 2025 1:51 UTC
17 points
9 comments9 min readLW link
(unstableontology.com)

As­sign Prob­a­bil­ities Functorially

kaleb17 Jul 2025 1:49 UTC
8 points
6 comments9 min readLW link

Try­ing the Ob­vi­ous Thing

16 Jul 2025 22:24 UTC
35 points
2 comments3 min readLW link
(cognition.cafe)

Emer­gence vs En­tropy—a uni­ver­sal paradox

James Stephen Brown16 Jul 2025 21:31 UTC
4 points
0 comments4 min readLW link

Selec­tive Gen­er­al­iza­tion: Im­prov­ing Ca­pa­bil­ities While Main­tain­ing Alignment

16 Jul 2025 21:25 UTC
66 points
4 comments7 min readLW link

Body­dou­ble /​ Think­ing As­sis­tant matchmaking

Raemon16 Jul 2025 19:54 UTC
51 points
10 comments2 min readLW link

Zero sum ex­pec­ta­tions as an ex­pla­na­tion of om­ni­cide-indifference

asasz16 Jul 2025 19:25 UTC
2 points
6 comments2 min readLW link

On the ge­o­met­ri­cal Na­ture of Insight

Giuseppe Birardi16 Jul 2025 19:12 UTC
3 points
0 comments41 min readLW link

Van­cou­ver Ra­tion­al­ists/​Tran­shu­man­ists/​Fu­tur­ists Beach Meetup

apocalypticc16 Jul 2025 19:09 UTC
2 points
0 comments1 min readLW link

What is the prob­a­bil­ity that fu­ture AI de­vel­op­ment will be se­ri­ously de­layed or ended due to en­ergy de­cline ?

AdamLacerdo16 Jul 2025 19:08 UTC
−1 points
12 comments1 min readLW link

Re­boot­ing the Singularity

cdkg16 Jul 2025 18:26 UTC
8 points
0 comments1 min readLW link
(philpapers.org)

Be­ing and Existence

Gordon Seidoh Worley16 Jul 2025 18:10 UTC
7 points
0 comments3 min readLW link
(uncertainupdates.substack.com)

Kimi K2

Zvi16 Jul 2025 16:20 UTC
52 points
5 comments12 min readLW link
(thezvi.wordpress.com)

[Question] How should Canada Ne­go­ti­ate with Trump on Tar­iffs?

Davey16 Jul 2025 15:56 UTC
1 point
2 comments1 min readLW link

[Question] Why haven’t we auto-trans­lated all AI al­ign­ment con­tent?

Algon16 Jul 2025 15:33 UTC
22 points
10 comments1 min readLW link

A Hal­lu­ci­na­tion Filter Idea That Might Not Scale—Yet

8harath16 Jul 2025 14:40 UTC
−5 points
0 comments2 min readLW link

Ar­tifi­cial Life Re­search Agenda

dmac_9316 Jul 2025 13:23 UTC
−11 points
0 comments1 min readLW link

On be­ing sort of back and sort of new here

Loki zen16 Jul 2025 12:55 UTC
32 points
13 comments3 min readLW link

Con­way’s Game of Life—com­plex­ity emerges from simplicity

James Stephen Brown16 Jul 2025 4:42 UTC
3 points
0 comments2 min readLW link
(nonzerosum.games)

Emer­gent Price-Fix­ing by LLM Auc­tion Agents

Lech Mazur16 Jul 2025 2:45 UTC
13 points
0 comments9 min readLW link

Map­ping Men­tal Moves

Jordan Rubin16 Jul 2025 2:28 UTC
3 points
0 comments2 min readLW link
(jordanmrubin.substack.com)

Defin­ing Mon­i­torable and Use­ful Goals

Rubi J. Hudson15 Jul 2025 23:06 UTC
11 points
0 comments16 min readLW link

[Question] Do you have any recom­men­da­tions for read­ings on global risk fore­cast­ing and anal­y­sis ap­plied to pub­lic policy de­sign on a slightly smaller scale, or for more spe­cific ob­jec­tives?

Ana Lopez15 Jul 2025 22:00 UTC
1 point
0 comments1 min readLW link

1 week fast on livestream for AI xrisk

samuelshadrach15 Jul 2025 21:36 UTC
1 point
2 comments1 min readLW link

AISN #59: EU Pub­lishes Gen­eral-Pur­pose AI Code of Practice

15 Jul 2025 18:59 UTC
10 points
0 comments4 min readLW link
(aisafety.substack.com)

Prin­ci­ples for Pick­ing Prac­ti­cal In­ter­pretabil­ity Projects

Sam Marks15 Jul 2025 17:38 UTC
27 points
0 comments13 min readLW link

Chain of Thought Mon­i­tora­bil­ity: A New and Frag­ile Op­por­tu­nity for AI Safety

15 Jul 2025 16:23 UTC
166 points
32 comments1 min readLW link
(bit.ly)

The Virtue of Fear and the Myth of “Fear­less­ness”

David_Veksler15 Jul 2025 16:10 UTC
7 points
3 comments1 min readLW link

Grok 4 Var­i­ous Things

Zvi15 Jul 2025 15:50 UTC
50 points
4 comments32 min readLW link
(thezvi.wordpress.com)

Value sys­tems of the fron­tier AIs, re­duced to slogans

Mitchell_Porter15 Jul 2025 15:10 UTC
4 points
0 comments1 min readLW link

What is David Chap­man talk­ing about when he talks about “mean­ing” in his book “Mean­ing­ness”?

SpectrumDT15 Jul 2025 14:29 UTC
22 points
15 comments2 min readLW link

Why Elimi­nat­ing De­cep­tion Won’t Align AI

Priyanka Bharadwaj15 Jul 2025 9:21 UTC
19 points
6 comments4 min readLW link

Gen­er­al­iz­ing zom­bie arguments

jessicata15 Jul 2025 5:09 UTC
23 points
9 comments7 min readLW link
(unstableontology.com)

Do con­fi­dent short timelines make sense?

15 Jul 2025 3:37 UTC
138 points
76 comments69 min readLW link

Critic Con­tri­bu­tions Are Log­i­cally Irrelevant

Zack_M_Davis15 Jul 2025 1:03 UTC
27 points
74 comments6 min readLW link

AISafety.com Hackathon 2025

Bryce Robertson15 Jul 2025 0:04 UTC
12 points
0 comments1 min readLW link

Don’t Say “I Want to Work In AI Policy”

henryj14 Jul 2025 23:19 UTC
5 points
0 comments2 min readLW link
(www.henryjosephson.com)

Re­cent Red­wood Re­search pro­ject proposals

14 Jul 2025 22:27 UTC
91 points
0 comments3 min readLW link

The Role of Re­spect: Why we in­evitably ap­peal to authority

jimmy14 Jul 2025 21:28 UTC
18 points
2 comments12 min readLW link