All 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 20242025

All Jan FebMarApr May Jun Jul Aug

All 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161718 19 20 21 22 23 24 25 26 27 28 29 30 31

Notable runaway-optimiser-like LLM failure modes on Biologically and Economically aligned AI safety benchmarks for LLMs with simplified observation format (BioBlue)

Roland Pihlakas, Sruthi Kuriakose and shrutidattagupta

Mar 16, 2025, 11:23 PM

45 points

8 comments11 min readLW link

Read More News

utilistrutilMar 16, 2025, 9:31 PM

22 points

2 comments5 min readLW link

What would a post labor economy actually look like?

Ansh JunejaMar 16, 2025, 8:38 PM

2 points

2 comments17 min readLW link

Why White-Box Redteaming Makes Me Feel Weird

Zygi StraznickasMar 16, 2025, 6:54 PM

204 points

36 comments3 min readLW link

How I’ve run major projects

benkuhnMar 16, 2025, 6:40 PM

126 points

10 comments8 min readLW link

(www.benkuhn.net)

Counting Objections to Housing

jefftkMar 16, 2025, 6:20 PM

13 points

7 comments3 min readLW link

(www.jefftk.com)

I make several million dollars per year and have hundreds of thousands of followers—what is the straightest line path to utilizing these resources to reduce existential-level AI threats?

shrimpyMar 16, 2025, 4:52 PM

161 points

26 comments1 min readLW link

Siberian Arctic origins of East Asian psychology

davidsunMar 16, 2025, 4:52 PM

6 points

0 comments1 min readLW link

AI Model History is Being Lost

ValeMar 16, 2025, 12:38 PM

19 points

1 comment1 min readLW link

(vale.rocks)

Metacognition Broke My Nail-Biting Habit

RafkaMar 16, 2025, 12:36 PM

45 points

20 comments2 min readLW link

[Question] Can we ever ensure AI alignment if we can only test AI personas?

Karl von WendtMar 16, 2025, 8:06 AM

22 points

8 comments1 min readLW link

Can time preferences make AI safe?

TerriLeafMar 15, 2025, 9:41 PM

1 point

1 comment2 min readLW link

Help make the orca language experiment happen

Towards_KeeperhoodMar 15, 2025, 9:39 PM

9 points

12 comments5 min readLW link

Announcing EXP: Experimental Summer Workshop on Collective Cognition

Jan_Kulveit and Anna Gajdova

Mar 15, 2025, 8:14 PM

36 points

2 comments4 min readLW link

AI Self-Correction vs. Self-Reflection: Is There a Fundamental Difference?

Project SolonMar 15, 2025, 6:24 PM

−3 points

0 comments1 min readLW link

The Fork in the Road

testingthewatersMar 15, 2025, 5:36 PM

14 points

12 comments2 min readLW link

Any-Benefit Mindset and Any-Reason Reasoning

silentbobMar 15, 2025, 5:10 PM

36 points

9 comments6 min readLW link

The Silent War: AGI-on-AGI Warfare and What It Means For Us

funnyfrancoMar 15, 2025, 3:24 PM

−1 points

2 comments22 min readLW link

Paper: Field-building and the epistemic culture of AI safety

peterslatteryMar 15, 2025, 12:30 PM

13 points

3 comments3 min readLW link

(firstmonday.org)

Why Billionaires Will Not Survive an AGI Extinction Event

funnyfrancoMar 15, 2025, 6:08 AM

8 points

0 comments14 min readLW link

AI Says It’s Not Conscious. That’s a Bad Answer to the Wrong Question.

JohnMarkNormanMar 15, 2025, 1:25 AM

1 point

0 comments2 min readLW link

Report & retrospective on the Dovetail fellowship

Alex_AltairMar 14, 2025, 11:20 PM

26 points

3 comments9 min readLW link

The Dangers of Outsourcing Thinking: Losing Our Critical Thinking to the Over-Reliance on AI Decision-Making

Cameron Tomé-MoreiraMar 14, 2025, 11:07 PM

11 points

4 comments8 min readLW link

LLMs may enable direct democracy at scale

Davey MorseMar 14, 2025, 10:51 PM

14 points

20 comments1 min readLW link

2024 Unofficial LessWrong Survey Results

ScrewtapeMar 14, 2025, 10:29 PM

109 points

28 comments48 min readLW link

AI4Science: The Hidden Power of Neural Networks in Scientific Discovery

Max MaMar 14, 2025, 9:18 PM

2 points

2 comments1 min readLW link

What are we doing when we do mathematics?

epicurusMar 14, 2025, 8:54 PM

7 points

2 comments1 min readLW link

(asving.com)

AI for Epistemics Hackathon

Austin ChenMar 14, 2025, 8:46 PM

77 points

12 comments10 min readLW link

(manifund.substack.com)

Geometry of Features in Mechanistic Interpretability

Gunnar CarlssonMar 14, 2025, 7:11 PM

16 points

0 comments8 min readLW link

AI Tools for Existential Security

Lizka and owencb

Mar 14, 2025, 6:38 PM

22 points

4 comments11 min readLW link

(www.forethought.org)

Capitalism as the Catalyst for AGI-Induced Human Extinction

funnyfrancoMar 14, 2025, 6:14 PM

−3 points

2 comments21 min readLW link

Minor interpretability exploration #3: Extending superposition to different activation functions (loss landscape)

Rareș BaronMar 14, 2025, 3:45 PM

3 points

0 comments3 min readLW link

AI for AI safety

Joe CarlsmithMar 14, 2025, 3:00 PM

78 points

13 comments17 min readLW link

(joecarlsmith.substack.com)

Evaluating the ROI of Information

Declan MolonyMar 14, 2025, 2:22 PM

12 points

3 comments3 min readLW link

On MAIM and Superintelligence Strategy

ZviMar 14, 2025, 12:30 PM

53 points

2 comments13 min readLW link

(thezvi.wordpress.com)

Whether governments will control AGI is important and neglected

Seth HerdMar 14, 2025, 9:48 AM

28 points

2 comments9 min readLW link

Something to fight for

RomanSMar 14, 2025, 8:27 AM

4 points

0 comments1 min readLW link

Interpreting Complexity

Maxwell AdamMar 14, 2025, 4:52 AM

53 points

8 comments26 min readLW link

Bike Lights are Cheap Enough to Give Away

jefftkMar 14, 2025, 2:10 AM

24 points

0 comments1 min readLW link

(www.jefftk.com)

Superintelligence’s goals are likely to be random

Mikhail SaminMar 13, 2025, 10:41 PM

6 points

6 comments5 min readLW link

Should AI safety be a mass movement?

mhamptonMar 13, 2025, 8:36 PM

5 points

1 comment4 min readLW link

Auditing language models for hidden objectives

Sam Marks, Johannes Treutlein, dmz, Sam Bowman, Hoagy, Carson Denison, Kei, 7vik, Akbir Khan, Austin Meek, Euan Ong, Christopher Olah, Fabien Roger, jeanne_, Meg, Drake Thomas, Adam Jermyn, Monte M and evhub

Mar 13, 2025, 7:18 PM

141 points

15 comments13 min readLW link

Reducing LLM deception at scale with self-other overlap fine-tuning

Marc Carauleanu, Diogo de Lucena, Gunnar_Zarncke, Judd Rosenblatt, Cameron Berg, Mike Vaiana and Trent Hodgeson

Mar 13, 2025, 7:09 PM

156 points

46 comments6 min readLW link

Vacuum Decay: Expert Survey Results

JessRiedelMar 13, 2025, 6:31 PM

96 points

26 comments13 min readLW link

A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management

simeon_c and Henry Papadatos

Mar 13, 2025, 6:29 PM

10 points

0 comments1 min readLW link

(arxiv.org)

Creating Complex Goals: A Model to Create Autonomous Agents

theravenMar 13, 2025, 6:17 PM

6 points

1 comment6 min readLW link

Habermas Machine

NicholasKeesMar 13, 2025, 6:16 PM

51 points

7 comments6 min readLW link

(mosaic-labs.org)

The Other Alignment Problem: Maybe AI Needs Protection From Us

PeterpiperMar 13, 2025, 6:03 PM

−3 points

0 comments3 min readLW link

AI #107: The Misplaced Hype Machine

ZviMar 13, 2025, 2:40 PM

47 points

10 comments40 min readLW link

(thezvi.wordpress.com)

Intelsat as a Model for International AGI Governance

rosehadshar and wdmacaskill

Mar 13, 2025, 12:58 PM

45 points

0 comments1 min readLW link

(www.forethought.org)

Keyboard shortcuts

Keys shown in yellow (e.g., ]) are accesskeys, and require a browser-specific modifier key (or keys).

Keys shown in grey (e.g., ?) do not require any modifier keys.

General
? Show keyboard shortcuts
Esc Hide keyboard shortcuts

Site navigation
h Go to Home (a.k.a. “Frontpage”) view
f Go to Featured (a.k.a. “Curated”) view
a Go to All (a.k.a. “Community”) view
m Go to Meta view
v Go to Tags view
c Go to Recent Comments view
r Go to Archive view
q Go to Sequences view
t Go to About page
u Go to User or Login page
o Go to Inbox page

Page navigation
, Jump up to top of page
. Jump down to bottom of page
/ Jump to top of comments section
s Search

Page actions
n New post or comment
e Edit current post

Post/comment list views
. Focus next entry in list
, Focus previous entry in list
; Cycle between links in focused entry
Enter Go to currently focused entry
Esc Unfocus currently focused entry
] Go to next page
[ Go to previous page
\ Go to first page
e Edit currently focused post

Editor
k Bold text
i Italic text
l Insert hyperlink
q Blockquote text

Appearance
= Increase text size
- Decrease text size
0 Reset to default text size
′ Cycle through content width settings
1 Switch to default theme [A]
2 Switch to dark theme [B]
3 Switch to grey theme [C]
4 Switch to ultramodern theme [D]
5 Switch to simple theme [E]
6 Switch to brutalist theme [F]
7 Switch to ReadTheSequences theme [G]
8 Switch to classic Less Wrong theme [H]
9 Switch to modern Less Wrong theme [I]
; Open theme tweaker
Enter Save changes and close theme tweaker
Esc Close theme tweaker (without saving)

Slide shows
l Start/resume slideshow
Esc Exit slideshow
→↓ Next slide
←↑ Previous slide
Space Reset slide zoom

Miscellaneous
x Switch to next view on user page
z Switch to previous view on user page
` Toggle compact comment list view
g Toggle anti-kibitzer