All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 202420252026

All Jan FebMarApr May Jun Jul Aug Sep Oct Nov Dec

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 151617 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Can time preferences make AI safe?

TerriLeaf15 Mar 2025 21:41 UTC

2 points

1 comment2 min readLW link

Help make the orca language experiment happen

Towards_Keeperhood15 Mar 2025 21:39 UTC

9 points

12 comments5 min readLW link

Announcing EXP: Experimental Summer Workshop on Collective Cognition

Jan_Kulveit and Anna Gajdova

15 Mar 2025 20:14 UTC

36 points

2 comments4 min readLW link

AI Self-Correction vs. Self-Reflection: Is There a Fundamental Difference?

Project Solon15 Mar 2025 18:24 UTC

−3 points

0 comments1 min readLW link

The Fork in the Road

testingthewaters15 Mar 2025 17:36 UTC

16 points

12 comments2 min readLW link

Any-Benefit Mindset and Any-Reason Reasoning

silentbob15 Mar 2025 17:10 UTC

36 points

9 comments6 min readLW link

deleted

funnyfranco15 Mar 2025 15:24 UTC

−1 points

2 comments1 min readLW link

Paper: Field-building and the epistemic culture of AI safety

peterslattery15 Mar 2025 12:30 UTC

13 points

3 comments3 min readLW link

(firstmonday.org)

deleted

funnyfranco15 Mar 2025 6:08 UTC

8 points

0 comments1 min readLW link

AI Says It’s Not Conscious. That’s a Bad Answer to the Wrong Question.

JohnMarkNorman15 Mar 2025 1:25 UTC

1 point

0 comments2 min readLW link

Report & retrospective on the Dovetail fellowship

Alex_Altair14 Mar 2025 23:20 UTC

26 points

3 comments9 min readLW link

The Dangers of Outsourcing Thinking: Losing Our Critical Thinking to the Over-Reliance on AI Decision-Making

Cameron Tomé-Moreira14 Mar 2025 23:07 UTC

11 points

4 comments8 min readLW link

LLMs may enable direct democracy at scale

Davey Morse14 Mar 2025 22:51 UTC

14 points

20 comments1 min readLW link

2024 Unofficial LessWrong Survey Results

Screwtape14 Mar 2025 22:29 UTC

112 points

28 comments48 min readLW link

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Max Ma14 Mar 2025 21:18 UTC

2 points

2 comments1 min readLW link

What are we doing when we do mathematics?

epicurus14 Mar 2025 20:54 UTC

7 points

2 comments1 min readLW link

(asving.com)

AI for Epistemics Hackathon

Austin Chen14 Mar 2025 20:46 UTC

76 points

12 comments10 min readLW link

(manifund.substack.com)

Geometry of Features in Mechanistic Interpretability

Gunnar Carlsson14 Mar 2025 19:11 UTC

16 points

0 comments8 min readLW link

AI Tools for Existential Security

Lizka and owencb

14 Mar 2025 18:38 UTC

22 points

4 comments11 min readLW link

(www.forethought.org)

deleted

funnyfranco14 Mar 2025 18:14 UTC

−3 points

2 comments1 min readLW link

Minor interpretability exploration #3: Extending superposition to different activation functions (loss landscape)

Rareș Baron14 Mar 2025 15:45 UTC

5 points

0 comments3 min readLW link

AI for AI safety

Joe Carlsmith14 Mar 2025 15:00 UTC

79 points

13 comments17 min readLW link

(joecarlsmith.substack.com)

On MAIM and Superintelligence Strategy

Zvi14 Mar 2025 12:30 UTC

53 points

2 comments13 min readLW link

(thezvi.wordpress.com)

Whether governments will control AGI is important and neglected

Seth Herd14 Mar 2025 9:48 UTC

29 points

2 comments9 min readLW link

Something to fight for

RomanS14 Mar 2025 8:27 UTC

4 points

0 comments1 min readLW link

Interpreting Complexity

Maxwell Adam14 Mar 2025 4:52 UTC

53 points

8 comments26 min readLW link

Bike Lights are Cheap Enough to Give Away

jefftk14 Mar 2025 2:10 UTC

24 points

0 comments1 min readLW link

(www.jefftk.com)

Superintelligence’s goals are likely to be random

Mikhail Samin13 Mar 2025 22:41 UTC

6 points

6 comments5 min readLW link

Should AI safety be a mass movement?

MattAlexander13 Mar 2025 20:36 UTC

5 points

1 comment4 min readLW link

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei Nishimura-Gasparian, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

13 Mar 2025 19:18 UTC

145 points

15 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Kvee, Cameron Berg, Mike Vaiana and Trent Hodgeson

13 Mar 2025 19:09 UTC

162 points

46 comments6 min readLW link

Vacuum Decay: Expert Survey Results

JessRiedel13 Mar 2025 18:31 UTC

96 points

26 comments13 min readLW link

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

simeon_c and Henry Papadatos

13 Mar 2025 18:29 UTC

10 points

0 comments1 min readLW link

(arxiv.org)

Creating Complex Goals: A Model to Create Autonomous Agents

theraven13 Mar 2025 18:17 UTC

6 points

1 comment6 min readLW link

Habermas Machine

NicholasKees13 Mar 2025 18:16 UTC

54 points

7 comments6 min readLW link

(mosaic-labs.org)

The Other Alignment Problem: Maybe AI Needs Protection From Us

Peterpiper13 Mar 2025 18:03 UTC

−2 points

0 comments3 min readLW link

AI #107: The Misplaced Hype Machine

Zvi13 Mar 2025 14:40 UTC

47 points

12 comments40 min readLW link

(thezvi.wordpress.com)

Intelsat as a Model for International AGI Governance

rosehadshar and wdmacaskill

13 Mar 2025 12:58 UTC

45 points

0 comments1 min readLW link

(www.forethought.org)

Stacity: a Lock-In Risk Benchmark for Large Language Models

alamerton13 Mar 2025 12:08 UTC

4 points

0 comments1 min readLW link

(huggingface.co)

The prospect of accelerated AI safety progress, including philosophical progress

Mitchell_Porter13 Mar 2025 10:52 UTC

12 points

0 comments4 min readLW link

The “Reversal Curse”: you still aren’t antropomorphising enough.

lumpenspace13 Mar 2025 10:24 UTC

3 points

0 comments1 min readLW link

(lumpenspace.substack.com)

Formalizing Space-Faring Civilizations Saturation concepts and metrics

Maxime Riché13 Mar 2025 9:40 UTC

4 points

0 comments8 min readLW link

The Economics of p(doom)

Jakub Growiec13 Mar 2025 7:33 UTC

2 points

0 comments1 min readLW link

Social Media: How to fix them before they become the biggest news platform

Sam G13 Mar 2025 7:28 UTC

5 points

2 comments3 min readLW link

Penny Whistle in E?

jefftk13 Mar 2025 2:40 UTC

9 points

1 comment1 min readLW link

(www.jefftk.com)

Anthropic, and taking “technical philosophy” more seriously

Raemon13 Mar 2025 1:48 UTC

139 points

29 comments11 min readLW link

LW/ACX Social Meetup

Stefan12 Mar 2025 23:13 UTC

2 points

0 comments1 min readLW link

I grade every NBA basketball game I watch based on enjoyability

proshowersinger12 Mar 2025 21:46 UTC

24 points

2 comments4 min readLW link

Kairos is hiring a Head of Operations/Founding Generalist

agucova12 Mar 2025 20:58 UTC

6 points

0 comments5 min readLW link

USAID Outlook: A Metaculus Forecasting Series

ChristianWilliams12 Mar 2025 20:34 UTC

9 points

0 comments1 min readLW link

(www.metaculus.com)