Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Report & retrospective on the Dovetail fellowship
Alex_Altair
Mar 14, 2025, 11:20 PM
26
points
3
comments
9
min read
LW
link
The Dangers of Outsourcing Thinking: Losing Our Critical Thinking to the Over-Reliance on AI Decision-Making
Cameron Tomé-Moreira
Mar 14, 2025, 11:07 PM
11
points
4
comments
8
min read
LW
link
LLMs may enable direct democracy at scale
Davey Morse
Mar 14, 2025, 10:51 PM
14
points
20
comments
1
min read
LW
link
2024 Unofficial LessWrong Survey Results
Screwtape
Mar 14, 2025, 10:29 PM
109
points
28
comments
48
min read
LW
link
AI4Science: The Hidden Power of Neural Networks in Scientific Discovery
Max Ma
Mar 14, 2025, 9:18 PM
2
points
2
comments
1
min read
LW
link
What are we doing when we do mathematics?
epicurus
Mar 14, 2025, 8:54 PM
7
points
1
comment
1
min read
LW
link
(asving.com)
AI for Epistemics Hackathon
Austin Chen
Mar 14, 2025, 8:46 PM
77
points
12
comments
10
min read
LW
link
(manifund.substack.com)
Geometry of Features in Mechanistic Interpretability
Gunnar Carlsson
Mar 14, 2025, 7:11 PM
16
points
0
comments
8
min read
LW
link
AI Tools for Existential Security
Lizka
and
owencb
Mar 14, 2025, 6:38 PM
22
points
4
comments
11
min read
LW
link
(www.forethought.org)
Capitalism as the Catalyst for AGI-Induced Human Extinction
funnyfranco
Mar 14, 2025, 6:14 PM
−3
points
2
comments
21
min read
LW
link
Minor interpretability exploration #3: Extending superposition to different activation functions (loss landscape)
Rareș Baron
Mar 14, 2025, 3:45 PM
3
points
0
comments
3
min read
LW
link
AI for AI safety
Joe Carlsmith
Mar 14, 2025, 3:00 PM
78
points
13
comments
17
min read
LW
link
(joecarlsmith.substack.com)
Evaluating the ROI of Information
Declan Molony
Mar 14, 2025, 2:22 PM
12
points
3
comments
3
min read
LW
link
On MAIM and Superintelligence Strategy
Zvi
Mar 14, 2025, 12:30 PM
53
points
2
comments
13
min read
LW
link
(thezvi.wordpress.com)
Whether governments will control AGI is important and neglected
Seth Herd
Mar 14, 2025, 9:48 AM
24
points
2
comments
9
min read
LW
link
Something to fight for
RomanS
Mar 14, 2025, 8:27 AM
4
points
0
comments
1
min read
LW
link
Interpreting Complexity
Maxwell Adam
Mar 14, 2025, 4:52 AM
53
points
8
comments
26
min read
LW
link
Bike Lights are Cheap Enough to Give Away
jefftk
Mar 14, 2025, 2:10 AM
24
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Superintelligence’s goals are likely to be random
Mikhail Samin
Mar 13, 2025, 10:41 PM
6
points
6
comments
5
min read
LW
link
Should AI safety be a mass movement?
mhampton
Mar 13, 2025, 8:36 PM
5
points
1
comment
4
min read
LW
link
Auditing language models for hidden objectives
Sam Marks
,
Johannes Treutlein
,
dmz
,
Sam Bowman
,
Hoagy
,
Carson Denison
,
Kei
,
7vik
,
Akbir Khan
,
Austin Meek
,
Euan Ong
,
Christopher Olah
,
Fabien Roger
,
jeanne_
,
Meg
,
Drake Thomas
,
Adam Jermyn
,
Monte M
and
evhub
Mar 13, 2025, 7:18 PM
141
points
15
comments
13
min read
LW
link
Reducing LLM deception at scale with self-other overlap fine-tuning
Marc Carauleanu
,
Diogo de Lucena
,
Gunnar_Zarncke
,
Judd Rosenblatt
,
Cameron Berg
,
Mike Vaiana
and
AE Studio
Mar 13, 2025, 7:09 PM
155
points
41
comments
6
min read
LW
link
Vacuum Decay: Expert Survey Results
JessRiedel
Mar 13, 2025, 6:31 PM
96
points
26
comments
LW
link
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
simeon_c
and
Henry Papadatos
Mar 13, 2025, 6:29 PM
10
points
0
comments
1
min read
LW
link
(arxiv.org)
Creating Complex Goals: A Model to Create Autonomous Agents
theraven
Mar 13, 2025, 6:17 PM
6
points
1
comment
6
min read
LW
link
Habermas Machine
NicholasKees
Mar 13, 2025, 6:16 PM
49
points
7
comments
6
min read
LW
link
(mosaic-labs.org)
The Other Alignment Problem: Maybe AI Needs Protection From Us
Peterpiper
Mar 13, 2025, 6:03 PM
−3
points
0
comments
3
min read
LW
link
AI #107: The Misplaced Hype Machine
Zvi
Mar 13, 2025, 2:40 PM
47
points
10
comments
40
min read
LW
link
(thezvi.wordpress.com)
Intelsat as a Model for International AGI Governance
rosehadshar
and
wdmacaskill
Mar 13, 2025, 12:58 PM
45
points
0
comments
1
min read
LW
link
(www.forethought.org)
Stacity: a Lock-In Risk Benchmark for Large Language Models
alamerton
Mar 13, 2025, 12:08 PM
4
points
0
comments
1
min read
LW
link
(huggingface.co)
The prospect of accelerated AI safety progress, including philosophical progress
Mitchell_Porter
Mar 13, 2025, 10:52 AM
11
points
0
comments
4
min read
LW
link
The “Reversal Curse”: you still aren’t antropomorphising enough.
lumpenspace
Mar 13, 2025, 10:24 AM
3
points
0
comments
1
min read
LW
link
(lumpenspace.substack.com)
Formalizing Space-Faring Civilizations Saturation concepts and metrics
Maxime Riché
Mar 13, 2025, 9:40 AM
4
points
0
comments
8
min read
LW
link
The Economics of p(doom)
Jakub Growiec
Mar 13, 2025, 7:33 AM
2
points
0
comments
1
min read
LW
link
Social Media: How to fix them before they become the biggest news platform
Sam G
Mar 13, 2025, 7:28 AM
5
points
2
comments
3
min read
LW
link
Penny Whistle in E?
jefftk
Mar 13, 2025, 2:40 AM
9
points
1
comment
1
min read
LW
link
(www.jefftk.com)
Anthropic, and taking “technical philosophy” more seriously
Raemon
Mar 13, 2025, 1:48 AM
125
points
29
comments
11
min read
LW
link
LW/ACX Social Meetup
Stefan
Mar 12, 2025, 11:13 PM
2
points
0
comments
1
min read
LW
link
I grade every NBA basketball game I watch based on enjoyability
proshowersinger
Mar 12, 2025, 9:46 PM
24
points
2
comments
4
min read
LW
link
Kairos is hiring a Head of Operations/Founding Generalist
agucova
Mar 12, 2025, 8:58 PM
6
points
0
comments
LW
link
USAID Outlook: A Metaculus Forecasting Series
ChristianWilliams
Mar 12, 2025, 8:34 PM
9
points
0
comments
LW
link
(www.metaculus.com)
What is instrumental convergence?
Vishakha
and
Algon
Mar 12, 2025, 8:28 PM
2
points
0
comments
2
min read
LW
link
(aisafety.info)
Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs
Sanyu Rajakumar
Mar 12, 2025, 5:56 PM
16
points
0
comments
13
min read
LW
link
Why Obedient AI May Be the Real Catastrophe
G~
Mar 12, 2025, 5:50 PM
5
points
2
comments
3
min read
LW
link
Your Communication Preferences Aren’t Law
Jonathan Moregård
Mar 12, 2025, 5:20 PM
25
points
4
comments
1
min read
LW
link
(honestliving.substack.com)
Reflections on Neuralese
Alice Blair
Mar 12, 2025, 4:29 PM
28
points
0
comments
5
min read
LW
link
Field tests of semi-rationality in Brazilian military training
P. João
Mar 12, 2025, 4:14 PM
31
points
0
comments
2
min read
LW
link
Many life-saving drugs fail for lack of funding. But there’s a solution: desperate rich people
Mvolz
Mar 12, 2025, 3:24 PM
17
points
0
comments
1
min read
LW
link
(www.theguardian.com)
The Most Forbidden Technique
Zvi
Mar 12, 2025, 1:20 PM
143
points
9
comments
17
min read
LW
link
(thezvi.wordpress.com)
You don’t actually need a physical multiverse to explain anthropic fine-tuning.
Fraser
Mar 12, 2025, 7:33 AM
7
points
8
comments
3
min read
LW
link
(frvser.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel