All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 282930 31

Singularity Survival Guide: A Bayesian Guide for Navigating the Pre-Singularity Period

mbrooks28 Mar 2025 23:21 UTC

6 points

4 comments2 min readLW link

Softmax, Emmett Shear’s new AI startup focused on “Organic Alignment”

Chris Lakin28 Mar 2025 21:23 UTC

61 points

2 comments1 min readLW link

(www.corememory.com)

The Pando Problem: Rethinking AI Individuality

Jan_Kulveit28 Mar 2025 21:03 UTC

133 points

14 comments13 min readLW link

Selection Pressures on LM Personas

Raymond Douglas28 Mar 2025 20:33 UTC

40 points

0 comments3 min readLW link

AXRP Episode 40 - Jason Gross on Compact Proofs and Interpretability

DanielFilan28 Mar 2025 18:40 UTC

26 points

0 comments89 min readLW link

[Question] Share AI Safety Ideas: Both Crazy and Not. №2

ank28 Mar 2025 17:22 UTC

2 points

10 comments1 min readLW link

AI x Bio Workshop

Allison Duettmann28 Mar 2025 17:21 UTC

16 points

0 comments1 min readLW link

[Question] How many times faster can the AGI advance the science than humans do?

StanislavKrym28 Mar 2025 15:16 UTC

0 points

0 comments1 min readLW link

Gemini 2.5 is the New SoTA

Zvi28 Mar 2025 14:20 UTC

52 points

1 comment12 min readLW link

(thezvi.wordpress.com)

Will the Need to Retrain AI Models from Scratch Block a Software Intelligence Explosion?

Tom Davidson28 Mar 2025 14:12 UTC

10 points

0 comments3 min readLW link

How We Might All Die in A Year

Greg C28 Mar 2025 13:22 UTC

6 points

13 comments21 min readLW link

(x.com)

The vision of Bill Thurston

TsviBT28 Mar 2025 11:45 UTC

50 points

34 comments4 min readLW link

What Uniparental Disomy Tells Us About Improper Imprinting in Humans

Morpheus28 Mar 2025 11:24 UTC

34 points

1 comment6 min readLW link

(www.tassiloneubauer.com)

Explaining British Naval Dominance During the Age of Sail

Arjun Panickssery28 Mar 2025 5:47 UTC

206 points

16 comments4 min readLW link

(arjunpanickssery.substack.com)

Will the AGIs be able to run the civilisation?

StanislavKrym28 Mar 2025 4:50 UTC

−7 points

2 comments3 min readLW link

[Question] Is AGI actually that likely to take off given the world energy consumption?

StanislavKrym27 Mar 2025 23:13 UTC

2 points

2 comments1 min readLW link

[Linkpost] The value of initiating a pursuit in temporal decision-making

Gunnar_Zarncke27 Mar 2025 21:47 UTC

13 points

0 comments2 min readLW link

Alignment through atomic agents

micseydel27 Mar 2025 18:43 UTC

−1 points

0 comments1 min readLW link

Machines of Stolen Grace

Riley Tavassoli27 Mar 2025 18:15 UTC

2 points

0 comments5 min readLW link

An argument for asexuality

filthy_hedonist27 Mar 2025 18:08 UTC

−2 points

10 comments1 min readLW link

On the plausibility of a “messy” rogue AI committing human-like evil

Jacob Griffith27 Mar 2025 18:06 UTC

8 points

0 comments7 min readLW link

AI Moral Alignment: The Most Important Goal of Our Generation

Ronen Bar27 Mar 2025 18:04 UTC

3 points

0 comments8 min readLW link

(forum.effectivealtruism.org)

Tracing the Thoughts of a Large Language Model

Adam Jermyn27 Mar 2025 17:20 UTC

307 points

24 comments10 min readLW link

(www.anthropic.com)

Computational Superposition in a Toy Model of the U-AND Problem

Adam Newgas27 Mar 2025 16:56 UTC

18 points

2 comments11 min readLW link

Mistral Large 2 (123B) seems to exhibit alignment faking

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Cameron Berg, Judd Rosenblatt, Mike Vaiana and Trent Hodgeson

27 Mar 2025 15:39 UTC

81 points

4 comments13 min readLW link

AIS Netherlands is looking for a Founding Executive Director (EOI form)

gergogaspar, Jelle Donders, Natalia Matuszczyk and ENAIS

27 Mar 2025 15:30 UTC

15 points

0 comments4 min readLW link

AI #109: Google Fails Marketing Forever

Zvi27 Mar 2025 14:50 UTC

42 points

12 comments35 min readLW link

(thezvi.wordpress.com)

What life will be like for humans if aligned ASI is created

james oofou27 Mar 2025 10:06 UTC

5 points

6 comments2 min readLW link

What is scaffolding?

Vishakha and Algon

27 Mar 2025 9:06 UTC

10 points

0 comments2 min readLW link

(aisafety.info)

Workflow vs interface vs implementation

Sniffnoy27 Mar 2025 7:38 UTC

12 points

0 comments1 min readLW link

Quick thoughts on the difficulty of widely conveying a non-stereotyped position

Sniffnoy27 Mar 2025 7:30 UTC

12 points

0 comments5 min readLW link

Doing principle-of-charity better

Sniffnoy27 Mar 2025 5:19 UTC

22 points

1 comment3 min readLW link

X as phenomenon vs as policy, Goodhart, and the AB problem

Sniffnoy27 Mar 2025 4:32 UTC

14 points

0 comments2 min readLW link

Consequentialism is for making decisions

Sniffnoy27 Mar 2025 4:00 UTC

21 points

9 comments1 min readLW link

Third-wave AI safety needs sociopolitical thinking

Richard_Ngo27 Mar 2025 0:55 UTC

100 points

23 comments26 min readLW link

Knowledge, Reasoning, and Superintelligence

owencb26 Mar 2025 23:28 UTC

21 points

1 comment7 min readLW link

(strangecities.substack.com)

Many Common Problems are NP-Hard, and Why that Matters for AI

Andrew Keenan Richardson26 Mar 2025 21:51 UTC

5 points

9 comments5 min readLW link

Fun With GPT-4o Image Generation

Zvi26 Mar 2025 19:50 UTC

76 points

3 comments15 min readLW link

(thezvi.wordpress.com)

I’m hiring a Research Assistant for a nonfiction book on AI!

garrison26 Mar 2025 19:46 UTC

17 points

0 comments1 min readLW link

(garrisonlovely.substack.com)

Automated Researchers Can Subtly Sandbag

gasteigerjo, Akbir Khan, Sam Bowman, Vlad Mikulik, Ethan Perez and Fabien Roger

26 Mar 2025 19:13 UTC

44 points

0 comments4 min readLW link

(alignment.anthropic.com)

Negative Results for SAEs On Downstream Tasks and Deprioritising SAE Research (GDM Mech Interp Team Progress Update #2)

lewis smith, Senthooran Rajamanoharan, Arthur Conmy, CallumMcDougall, Tom Lieberum, János Kramár, Rohin Shah and Neel Nanda

26 Mar 2025 19:07 UTC

115 points

15 comments29 min readLW link

(deepmindsafetyresearch.medium.com)

AI companies should be safety-testing the most capable versions of their models

sjadler26 Mar 2025 19:03 UTC

17 points

6 comments1 min readLW link

(stevenadler.substack.com)

Conceptual Rounding Errors

Jan_Kulveit26 Mar 2025 19:00 UTC

153 points

15 comments3 min readLW link

(boundedlyrational.substack.com)

Personal Agents: The First Step in Emergent AI Society

Andrey Seryakov26 Mar 2025 18:55 UTC

3 points

0 comments2 min readLW link

Will AI R&D Automation Cause a Software Intelligence Explosion?

Tom Davidson and Daniel_Eth

26 Mar 2025 18:12 UTC

19 points

3 comments2 min readLW link

(www.forethought.org)

Why Does Unemployment Happen?

Nicholas D.26 Mar 2025 18:02 UTC

−2 points

2 comments1 min readLW link

(nicholasdecker.substack.com)

Finding Emergent Misalignment

Jan Betley26 Mar 2025 17:33 UTC

33 points

0 comments3 min readLW link

Center on Long-Term Risk: Summer Research Fellowship 2025 - Apply Now

Tristan Cook26 Mar 2025 17:29 UTC

33 points

0 comments1 min readLW link

(longtermrisk.org)

Eukaryote Skips Town—Why I’m leaving DC

eukaryote26 Mar 2025 17:16 UTC

82 points

1 comment6 min readLW link

(eukaryotewritesblog.com)

Apply to become a Futurekind AI Facilitator or Mentor (deadline: April 10)

superbeneficiary26 Mar 2025 15:47 UTC

4 points

0 comments1 min readLW link

(forum.effectivealtruism.org)