Eval­u­at­ing Pre­dic­tion in Acausal Mixed-Mo­tive Settings

Tim Chan31 Aug 2025 22:58 UTC
14 points
0 comments6 min readLW link

My AI Pre­dic­tions for 2027

Taylor G. Lunt31 Aug 2025 22:00 UTC
37 points
73 comments16 min readLW link

He­do­nium is AI Alignment

31 Aug 2025 19:46 UTC
−16 points
0 comments6 min readLW link

To Rae­mon: bet in My (per­sonal) Goals

P. João31 Aug 2025 15:48 UTC
3 points
0 comments3 min readLW link

Le­gal Per­son­hood—The First Amend­ment (Part 2)

Stephen Martin31 Aug 2025 12:06 UTC
2 points
0 comments2 min readLW link

A quan­tum equiv­a­lent to Bayes’ rule

dr_s31 Aug 2025 10:06 UTC
51 points
17 comments8 min readLW link

ACX Meetup Wellington

NotEvil31 Aug 2025 5:13 UTC
1 point
1 comment1 min readLW link

Sleep­ing Ex­perts in the (re­flec­tive) Solomonoff Prior

31 Aug 2025 4:55 UTC
16 points
0 comments3 min readLW link

Hack­ing The Spec­trum For Profit (Maybe Fun)

Elek Szid31 Aug 2025 4:49 UTC
7 points
3 comments3 min readLW link

AI agents and painted facades

30 Aug 2025 23:13 UTC
38 points
3 comments2 min readLW link
(fulcrumresearch.ai)

ACX Every­where fall 2025 - New­ton, MA

duck_master30 Aug 2025 22:02 UTC
1 point
1 comment1 min readLW link

[via bsky, found pa­per] “AI Con­scious­ness: A Cen­trist Man­i­festo”

the gears to ascension30 Aug 2025 21:05 UTC
13 points
0 comments1 min readLW link
(philpapers.org)

Fe­male sex­ual at­trac­tive­ness seems more egal­i­tar­ian than peo­ple acknowledge

lc30 Aug 2025 18:09 UTC
53 points
27 comments3 min readLW link

AI Sleeper Agents: How An­thropic Trains and Catches Them—Video

Writer30 Aug 2025 17:53 UTC
9 points
0 comments7 min readLW link
(youtu.be)

Un­der­stand­ing LLMs: In­sights from Mechanis­tic Interpretability

Stephen McAleese30 Aug 2025 16:50 UTC
40 points
2 comments30 min readLW link

Le­gal Per­son­hood—The First Amend­ment (Part 1)

Stephen Martin30 Aug 2025 13:20 UTC
4 points
0 comments3 min readLW link

Method Iter­a­tion: An LLM Prompt­ing Technique

Davey Morse30 Aug 2025 0:08 UTC
−12 points
1 comment2 min readLW link

[Question] How to bet on my­self? From ex­pec­ta­tions to ro­bust goals

P. João29 Aug 2025 18:33 UTC
4 points
3 comments1 min readLW link

AI Se­cu­rity Lon­don Hackathon

Prince Kumar29 Aug 2025 18:23 UTC
4 points
0 comments1 min readLW link

Sum­mary of our Work­shop on Post-AGI Outcomes

29 Aug 2025 17:14 UTC
96 points
3 comments3 min readLW link

Wikipe­dia, but writ­ten by AIs

Viliam29 Aug 2025 16:37 UTC
32 points
9 comments4 min readLW link

60 U.K. Law­mak­ers Ac­cuse Google of Break­ing AI Safety Pledge

Joseph Miller29 Aug 2025 16:09 UTC
50 points
1 comment1 min readLW link
(time.com)

AI #131 Part 2: Var­i­ous Misal­igned Things

Zvi29 Aug 2025 15:00 UTC
34 points
7 comments41 min readLW link
(thezvi.wordpress.com)

The Gabian His­tory of Mathematics

29 Aug 2025 13:48 UTC
21 points
9 comments2 min readLW link
(cognition.cafe)

Qual­ified rights for AI agents

Gauraventh29 Aug 2025 12:42 UTC
4 points
1 comment5 min readLW link
(robertandgaurav.substack.com)

I am try­ing to write the his­tory of tran­shu­man­ism-re­lated communities

Ihor Kendiukhov29 Aug 2025 11:37 UTC
7 points
4 comments1 min readLW link

Claude Plays… What­ever it Wants

Adam B29 Aug 2025 10:57 UTC
37 points
4 comments7 min readLW link

Not step­ping on bugs

Gauraventh29 Aug 2025 10:08 UTC
1 point
6 comments2 min readLW link
(y1d2.com)

Defen­sive­ness does not equal guilt

Kaj_Sotala29 Aug 2025 6:14 UTC
60 points
16 comments3 min readLW link

Truth

Kabir Kumar28 Aug 2025 20:53 UTC
6 points
0 comments2 min readLW link
(kkumar97.blogspot.com)

Here’s 18 Ap­pli­ca­tions of De­cep­tion Probes

28 Aug 2025 18:59 UTC
38 points
0 comments22 min readLW link

LW@Dragoncon Meetup

Error28 Aug 2025 18:40 UTC
7 points
0 comments1 min readLW link

If we can ed­u­cate AIs, why not ap­ply that ed­u­ca­tion to peo­ple? - A Si­mu­la­tion with Claude

P. João28 Aug 2025 16:37 UTC
3 points
0 comments7 min readLW link

AI #131 Part 1: Gem­ini 2.5 Flash Image is Cool

Zvi28 Aug 2025 16:20 UTC
39 points
4 comments30 min readLW link
(thezvi.wordpress.com)

Von Neu­mann’s Fal­lacy and You

incident-recipient28 Aug 2025 15:52 UTC
98 points
29 comments4 min readLW link

AI mis­be­havi­our in the wild from An­don Labs’ Safety Report

Lukas Petersson28 Aug 2025 15:10 UTC
39 points
0 comments1 min readLW link
(andonlabs.com)

The Other Align­ment Prob­lems: How epistemic, moral and aes­thetic norms get entangled

James Diacoumis28 Aug 2025 11:26 UTC
3 points
0 comments5 min readLW link

We should think about the pivotal act again. Here’s a bet­ter ver­sion of it.

otto.barten28 Aug 2025 9:29 UTC
11 points
2 comments3 min readLW link

Elab­o­ra­tive reading

DirectedEvolution28 Aug 2025 8:55 UTC
20 points
0 comments9 min readLW link

Pro­fan­ity causes emer­gent mis­al­ign­ment, but with qual­i­ta­tively differ­ent re­sults than in­se­cure code

megasilverfist28 Aug 2025 8:22 UTC
21 points
2 comments8 min readLW link

Us­ing Psy­chol­in­guis­tic Sig­nals to Im­prove AI Safety

Jkreindler27 Aug 2025 22:30 UTC
−2 points
0 comments4 min readLW link

Tran­si­tion and So­cial Dy­nam­ics of a post-co­or­di­na­tion world

Lessbroken27 Aug 2025 22:23 UTC
1 point
0 comments7 min readLW link

Tech­ni­cal AI Safety re­search tax­on­omy at­tempt (2025)

Benjamin Plaut27 Aug 2025 22:17 UTC
2 points
0 comments2 min readLW link

The Fu­ture of AI Agents

kavya27 Aug 2025 21:58 UTC
6 points
8 comments5 min readLW link

Against “Model Welfare” in 2025

Haley Moller27 Aug 2025 21:56 UTC
−10 points
8 comments4 min readLW link

Are They Start­ing To Take Our Jobs?

Zvi27 Aug 2025 18:50 UTC
44 points
6 comments5 min readLW link
(thezvi.wordpress.com)

Will Any Crap Cause Emer­gent Misal­ign­ment?

J Bostock27 Aug 2025 18:20 UTC
192 points
37 comments3 min readLW link

Open Global In­vest­ment as a Gover­nance Model for AGI

Nick Bostrom27 Aug 2025 17:42 UTC
152 points
47 comments39 min readLW link
(nickbostrom.com)

Uncer­tain Up­dates Au­gust 2025

Gordon Seidoh Worley27 Aug 2025 17:31 UTC
11 points
1 comment2 min readLW link
(uncertainupdates.substack.com)

At­tach­ing re­quire­ments to model re­leases has se­ri­ous down­sides (rel­a­tive to a differ­ent dead­line for these re­quire­ments)

ryan_greenblatt27 Aug 2025 17:04 UTC
99 points
2 comments3 min readLW link