All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30

Memory Decoding Journal Club: Neocortical synaptic engrams for remote contextual memories

Devin Ward15 Jun 2025 23:22 UTC

1 point

0 comments1 min readLW link

Every Major LLM Endorses Newcomb One-Boxing

jackmastermind15 Jun 2025 20:44 UTC

19 points

13 comments1 min readLW link

(jacktlab.substack.com)

FDT Does Not Endorse Itself in Asymmetric Games

jackmastermind15 Jun 2025 20:44 UTC

23 points

3 comments5 min readLW link

Can We Change the Goals of a Toy RL Agent?

tuphs and Adrià Garriga-alonso

15 Jun 2025 20:34 UTC

20 points

0 comments9 min readLW link

Some reprogenetics-related projects you could help with

TsviBT15 Jun 2025 20:25 UTC

80 points

1 comment4 min readLW link

Risk Tokens: Economic Security in AI Safety

mhdempsey15 Jun 2025 19:25 UTC

1 point

0 comments6 min readLW link

(www.michaeldempsey.me)

Aligned monetization of modern dating

kwang15 Jun 2025 16:01 UTC

0 points

0 comments3 min readLW link

(kevw.substack.com)

Intelligence Is Not Magic, But Your Threshold For “Magic” Is Pretty Low

Expertium15 Jun 2025 15:23 UTC

215 points

27 comments1 min readLW link

Estrogen: A trip report

cube_flipper15 Jun 2025 13:15 UTC

167 points

42 comments27 min readLW link

(smoothbrains.net)

[Question] Do multimodal LLMs (like 4o) use OCR under the hood to read dense text in images?

2PuNCheeZ15 Jun 2025 11:20 UTC

4 points

1 comment1 min readLW link

Book review: Air-borne by Carl Zimmer

eukaryote15 Jun 2025 5:49 UTC

34 points

0 comments11 min readLW link

(eukaryotewritesblog.com)

My favorite Soviet songs

Nina Panickssery15 Jun 2025 2:48 UTC

22 points

1 comment5 min readLW link

(ninapanickssery.substack.com)

Side quests in curriculum learning and regularization

Sandy Fraser15 Jun 2025 2:03 UTC

5 points

0 comments10 min readLW link

AXRP Episode 43 - David Lindner on Myopic Optimization with Non-myopic Approval

DanielFilan15 Jun 2025 1:20 UTC

12 points

0 comments56 min readLW link

Jailbreaking Claude 4 and Other Frontier Language Models

James Sullivan15 Jun 2025 0:31 UTC

1 point

0 comments3 min readLW link

(open.substack.com)

Endometriosis is an incredibly interesting disease

Abhishaike Mahajan14 Jun 2025 22:14 UTC

166 points

5 comments16 min readLW link

(www.owlposting.com)

Field Notes from Shipping Real Code with Claude

creatorrr14 Jun 2025 16:36 UTC

22 points

0 comments12 min readLW link

(diwank.space)

Training Superior Sparse Autoencoders for Instruct Models

Haoran Ye14 Jun 2025 16:35 UTC

4 points

0 comments7 min readLW link

Foresight Institute AI safety RFPs in automation, security, multi-agent, neuro

Allison Duettmann14 Jun 2025 16:29 UTC

6 points

0 comments2 min readLW link

A Very Simple Case For Giving To Shrimp

Bentham's Bulldog14 Jun 2025 15:31 UTC

−6 points

1 comment3 min readLW link

Why we’re still doing normal school

juliawise14 Jun 2025 12:40 UTC

85 points

0 comments3 min readLW link

What Caused the Fertility Collapse?

Zero Contradictions14 Jun 2025 7:15 UTC

−3 points

2 comments4 min readLW link

(expandingrationality.substack.com)

Relocation triggers

denkenberger14 Jun 2025 6:36 UTC

2 points

0 comments1 min readLW link

Memory Decoding Journal Club: Neocortical synaptic engrams for remote contextual memories

Devin Ward14 Jun 2025 2:26 UTC

1 point

0 comments1 min readLW link

[Question] How concerned are you about a fast takeoff due to a leap in hardware usage?

MichaelDickens14 Jun 2025 1:15 UTC

9 points

7 comments1 min readLW link

[Question] How could I tell someone that consciousness is not the primary concern of AI Safety?

Lysandre Terrisse13 Jun 2025 22:44 UTC

11 points

2 comments3 min readLW link

Debate experiments at The Curve, LessOnline and Manifest

Nathan Young13 Jun 2025 22:35 UTC

36 points

12 comments5 min readLW link

(nathanpmyoung.substack.com)

Futarchy’s fundamental flaw

dynomight13 Jun 2025 22:08 UTC

178 points

49 comments9 min readLW link

(dynomight.net)

The Pros and Cons of Being Among Your Tribe

Sable13 Jun 2025 21:41 UTC

32 points

0 comments7 min readLW link

(affablyevil.substack.com)

Constraining Minds, Not Goals: A Structural Approach to AI Alignment

Johannes C. Mayer13 Jun 2025 21:06 UTC

25 points

0 comments9 min readLW link

The optimal level of optimization is suboptimal

ellifournier13 Jun 2025 18:06 UTC

4 points

4 comments1 min readLW link

(ellifournier.substack.com)

On Pruning an Overgrown Garden

Vaatzes13 Jun 2025 17:54 UTC

3 points

3 comments6 min readLW link

Learned helplessness about “teaching to the test”

Viliam13 Jun 2025 17:53 UTC

36 points

16 comments3 min readLW link

Information-Dense Conference Badges

ozziegooen13 Jun 2025 17:52 UTC

28 points

4 comments4 min readLW link

(ozziegooen.substack.com)

The Superwisdom Thesis: Why Superintelligence Does Not Pose An Existential Threat

Max Abecassis13 Jun 2025 17:35 UTC

−23 points

9 comments30 min readLW link

The Boat Theft Theory of Consciousness

Lorec13 Jun 2025 16:38 UTC

41 points

36 comments2 min readLW link

Monthly Roundup #31: June 2025

Zvi13 Jun 2025 16:20 UTC

37 points

3 comments50 min readLW link

(thezvi.wordpress.com)

Unsupervised Elicitation of Language Models

Jiaxin Wen, Peter Hase, Sam Marks, Collin, Ethan Perez and janleike

13 Jun 2025 16:15 UTC

57 points

12 comments2 min readLW link

Lucky Omega Problem

Tapatakt13 Jun 2025 14:54 UTC

10 points

4 comments4 min readLW link

Distillation Robustifies Unlearning

Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud and TurnTrout

13 Jun 2025 13:45 UTC

236 points

43 comments8 min readLW link

(arxiv.org)

Self-Adapting Language Models (from MIT, arXiv preprint)

Person13 Jun 2025 13:08 UTC

5 points

1 comment1 min readLW link

Do Not Tile the Lightcone with Your Confused Ontology

Jan_Kulveit13 Jun 2025 12:45 UTC

229 points

27 comments5 min readLW link

(boundedlyrational.substack.com)

Corporations as Paperclip/Profit Maximizers

busssard13 Jun 2025 10:55 UTC

17 points

3 comments22 min readLW link

4. Why existing approaches to cause prioritization are not robust to unawareness

Anthony DiGiovanni13 Jun 2025 8:55 UTC

26 points

0 comments17 min readLW link

[Question] Under what conditions should humans stop pursuing technical AI safety careers?

S. Alex Bradt13 Jun 2025 5:56 UTC

6 points

0 comments1 min readLW link

[linkpost] AI Alignment is About Culture, Not Control by JCorvinus

Milan W13 Jun 2025 0:07 UTC

1 point

8 comments1 min readLW link

(jcorvinus.medium.com)

Forecast AI 2027

ChristianWilliams12 Jun 2025 21:12 UTC

20 points

0 comments1 min readLW link

(www.metaculus.com)

CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions

Annapurna12 Jun 2025 19:53 UTC

8 points

0 comments1 min readLW link

(arxiv.org)

When does training a model change its goals?

Vivek Hebbar and ryan_greenblatt

12 Jun 2025 18:43 UTC

78 points

3 comments15 min readLW link

Restraining Factors in AI Alignment Systems

theophilus tabuke12 Jun 2025 18:17 UTC

1 point

1 comment1 min readLW link