All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan Feb Mar Apr MayJunJul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 131415 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

[Question] How could I tell someone that consciousness is not the primary concern of AI Safety?

Lysandre Terrisse13 Jun 2025 22:44 UTC

11 points

2 comments3 min readLW link

Debate experiments at The Curve, LessOnline and Manifest

Nathan Young13 Jun 2025 22:35 UTC

35 points

12 comments5 min readLW link

(nathanpmyoung.substack.com)

Futarchy’s fundamental flaw

dynomight13 Jun 2025 22:08 UTC

186 points

52 comments9 min readLW link

(dynomight.net)

The Pros and Cons of Being Among Your Tribe

Sable13 Jun 2025 21:41 UTC

39 points

0 comments7 min readLW link

(affablyevil.substack.com)

Constraining Minds, Not Goals: A Structural Approach to AI Alignment

Johannes C. Mayer13 Jun 2025 21:06 UTC

25 points

0 comments9 min readLW link

On Pruning an Overgrown Garden

Vaatzes13 Jun 2025 17:54 UTC

3 points

3 comments6 min readLW link

Learned helplessness about “teaching to the test”

Viliam13 Jun 2025 17:53 UTC

36 points

16 comments3 min readLW link

Information-Dense Conference Badges

ozziegooen13 Jun 2025 17:52 UTC

28 points

4 comments4 min readLW link

(ozziegooen.substack.com)

The Superwisdom Thesis: Why Superintelligence Does Not Pose An Existential Threat

Max Abecassis13 Jun 2025 17:35 UTC

−23 points

9 comments30 min readLW link

The Boat Theft Theory of Consciousness

Lorec13 Jun 2025 16:38 UTC

43 points

36 comments2 min readLW link

Monthly Roundup #31: June 2025

Zvi13 Jun 2025 16:20 UTC

37 points

3 comments50 min readLW link

(thezvi.wordpress.com)

Unsupervised Elicitation of Language Models

Jiaxin Wen, Peter Hase, Sam Marks, Collin, Ethan Perez and janleike

13 Jun 2025 16:15 UTC

57 points

12 comments2 min readLW link

Lucky Omega Problem

Tapatakt13 Jun 2025 14:54 UTC

10 points

4 comments4 min readLW link

Distillation Robustifies Unlearning

Bruce W. Lee, Addie Foote, alexinf, leni, Jacob G-W, Harish Kamath, Bryce Woodworth, cloud and TurnTrout

13 Jun 2025 13:45 UTC

239 points

43 comments8 min readLW link

(arxiv.org)

Self-Adapting Language Models (from MIT, arXiv preprint)

Person13 Jun 2025 13:08 UTC

5 points

1 comment1 min readLW link

Do Not Tile the Lightcone with Your Confused Ontology

Jan_Kulveit13 Jun 2025 12:45 UTC

236 points

27 comments5 min readLW link

(boundedlyrational.substack.com)

Corporations as Paperclip/Profit Maximizers

busssard13 Jun 2025 10:55 UTC

17 points

3 comments22 min readLW link

4. Why existing approaches to cause prioritization are not robust to unawareness

Anthony DiGiovanni13 Jun 2025 8:55 UTC

26 points

0 comments16 min readLW link

[Question] Under what conditions should humans stop pursuing technical AI safety careers?

S. Alex Bradt13 Jun 2025 5:56 UTC

6 points

0 comments1 min readLW link

[linkpost] AI Alignment is About Culture, Not Control by JCorvinus

Milan W13 Jun 2025 0:07 UTC

1 point

8 comments1 min readLW link

(jcorvinus.medium.com)

Forecast AI 2027

ChristianWilliams12 Jun 2025 21:12 UTC

20 points

0 comments1 min readLW link

(www.metaculus.com)

CRMArena-Pro: Holistic Assessment of LLM Agents Across Diverse Business Scenarios and Interactions

Annapurna12 Jun 2025 19:53 UTC

8 points

0 comments1 min readLW link

(arxiv.org)

When does training a model change its goals?

Vivek Hebbar and ryan_greenblatt

12 Jun 2025 18:43 UTC

79 points

3 comments15 min readLW link

Restraining Factors in AI Alignment Systems

theophilus tabuke12 Jun 2025 18:17 UTC

1 point

1 comment1 min readLW link

Analysis of Automated Prompt Engineering for Forecasting

ChristianWilliams12 Jun 2025 15:49 UTC

6 points

0 comments7 min readLW link

(www.metaculus.com)

AI #120: While o3 Turned Pro

Zvi12 Jun 2025 15:30 UTC

51 points

3 comments53 min readLW link

(thezvi.wordpress.com)

Towards mutually assured cooperation

mikko12 Jun 2025 15:15 UTC

5 points

0 comments1 min readLW link

What If We Could Monitor Human Intent?

Saif Khan12 Jun 2025 8:51 UTC

−8 points

6 comments3 min readLW link

The Way of a Skeptic

Martin Sustrik12 Jun 2025 5:40 UTC

38 points

2 comments6 min readLW link

(www.250bpm.com)

[Question] When should you read a biography?

CstineSublime12 Jun 2025 5:19 UTC

3 points

6 comments3 min readLW link

An Easily Overlooked Post on the Automation of Wisdom and Philosophy

Chris_Leong12 Jun 2025 2:54 UTC

19 points

0 comments1 min readLW link

(blog.aiimpacts.org)

Maybe Social Anxiety Is Just You Failing At Mind Control

25Hour11 Jun 2025 23:49 UTC

85 points

21 comments16 min readLW link

OpenAI now has an RL API which is broadly accessible

ryan_greenblatt11 Jun 2025 23:39 UTC

44 points

1 comment5 min readLW link

So You Want to Work at a Frontier AI Lab

Joe Rogero11 Jun 2025 23:11 UTC

54 points

14 comments7 min readLW link

(intelligence.org)

Commentary On The Turing Apocrypha

jdp11 Jun 2025 22:52 UTC

26 points

0 comments11 min readLW link

(minihf.com)

[Question] My friend wants a good book recommendation to understand AI, AI safety, and the field, and probably the drama. He’s smart but non-technical and not keeping up with trends. Any recs?

JohnGreer11 Jun 2025 22:32 UTC

9 points

0 comments1 min readLW link

A Revision to Market Monetarism: Individual Hoarding as Rational, Competition for Dollars as Zero-Sum?

Lorec11 Jun 2025 20:13 UTC

4 points

0 comments4 min readLW link

Investigating Accidental Misalignment: Causal Effects of Fine-Tuning Data on Model Vulnerability

Zhijing Jin, Punya Syon Pandey, samuelsimko and Kellin Pelrine

11 Jun 2025 19:30 UTC

6 points

0 comments5 min readLW link

The Dream of a Gentle Singularity

Zvi11 Jun 2025 19:30 UTC

57 points

7 comments12 min readLW link

(thezvi.wordpress.com)

Beware General Claims about “Generalizable Reasoning Capabilities” (of Modern AI Systems)

LawrenceC11 Jun 2025 19:27 UTC

318 points

19 comments16 min readLW link

Religion for Rationalists

Gordon Seidoh Worley11 Jun 2025 19:05 UTC

27 points

65 comments4 min readLW link

Difficulties of Eschatological policy making [Linkpost]

Noosphere8911 Jun 2025 14:12 UTC

11 points

3 comments3 min readLW link

(jack-clark.net)

Hydra

Matrice Jacobine11 Jun 2025 14:07 UTC

24 points

0 comments1 min readLW link

(philosophybear.substack.com)

SafeRLHub: An Interactive Resource for RL Safety and Interpretability

Siya and deneille

11 Jun 2025 5:47 UTC

11 points

0 comments7 min readLW link

More on policy arguments and the AB problem

Sniffnoy11 Jun 2025 4:42 UTC

11 points

0 comments4 min readLW link

Using AI Video Generation to Re-create Memories

Annapurna11 Jun 2025 4:06 UTC

−1 points

2 comments1 min readLW link

Conflicted on AI Politics

jefftk11 Jun 2025 3:40 UTC

27 points

5 comments2 min readLW link

(www.jefftk.com)

the void

nostalgebraist11 Jun 2025 3:19 UTC

427 points

108 comments1 min readLW link

(nostalgebraist.tumblr.com)

$500 bounty for engagement on asymmetric AI risk

YonatanK10 Jun 2025 21:50 UTC

23 points

14 comments2 min readLW link

AI-2027 Response: Inter-AI Tensions, Value Distillation, US Multipolarity, & More

Gatlen Culp10 Jun 2025 18:17 UTC

3 points

0 comments8 min readLW link

(gatlen.blog)