Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
1
The Stag Hunt—cultivating cooperation to reap rewards
James Stephen Brown
Feb 25, 2025, 11:45 PM
7
points
0
comments
4
min read
LW
link
(nonzerosum.games)
Three Levels for Large Language Model Cognition
Eleni Angelou
Feb 25, 2025, 11:14 PM
21
points
0
comments
5
min read
LW
link
[Crosspost] Strategic wealth accumulation under transformative AI expectations
arden446
and
CalebMaresca
Feb 25, 2025, 9:50 PM
5
points
0
comments
17
min read
LW
link
(forum.effectivealtruism.org)
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
and
Owain_Evans
Feb 25, 2025, 5:39 PM
329
points
91
comments
4
min read
LW
link
We Can Build Compassionate AI
Gordon Seidoh Worley
Feb 25, 2025, 4:37 PM
9
points
6
comments
4
min read
LW
link
(uncertainupdates.substack.com)
[Question]
Intellectual lifehacks repo
Antoine de Scorraille
Feb 25, 2025, 4:32 PM
11
points
15
comments
1
min read
LW
link
Economics Roundup #5
Zvi
Feb 25, 2025, 1:40 PM
27
points
10
comments
20
min read
LW
link
(thezvi.wordpress.com)
Making alignment a law of the universe
Richard Juggins
Feb 25, 2025, 10:44 AM
0
points
3
comments
15
min read
LW
link
Revisiting Conway’s Law
annebrandes
Feb 25, 2025, 8:33 AM
12
points
4
comments
3
min read
LW
link
Demystifying the Pinocchio Paradox
Novak Zukowski
Feb 25, 2025, 6:16 AM
−1
points
0
comments
3
min read
LW
link
Technical comparison of Deepseek, Novasky, S1, Helix, P0
Juliezhanggg
Feb 25, 2025, 4:20 AM
8
points
0
comments
5
min read
LW
link
Upcoming Protest for AI Safety
Matt Vincent
Feb 25, 2025, 3:04 AM
12
points
0
comments
1
min read
LW
link
(www.pauseai-us.org)
what an efficient market feels from inside
DMMF
Feb 25, 2025, 2:38 AM
40
points
9
comments
6
min read
LW
link
(danfrank.ca)
Metacompilation
Donald Hobson
Feb 24, 2025, 10:58 PM
11
points
1
comment
4
min read
LW
link
The manifest manifesto
dkl9
Feb 24, 2025, 10:13 PM
6
points
2
comments
2
min read
LW
link
(dkl9.net)
Credit Suisse collapse obfuscated Parreaux, Thiébaud & Partners scandal
pocock
Feb 24, 2025, 9:28 PM
3
points
0
comments
1
min read
LW
link
(juristgate.com)
Topological Data Analysis and Mechanistic Interpretability
Gunnar Carlsson
Feb 24, 2025, 7:56 PM
16
points
4
comments
7
min read
LW
link
Zizian comparisons / connections in the open source & Linux communities
pocock
Feb 24, 2025, 7:55 PM
−15
points
0
comments
1
min read
LW
link
Local Trust
ben_levinstein
,
Daniel Herrmann
and
Aydin Mohseni
Feb 24, 2025, 7:53 PM
21
points
4
comments
5
min read
LW
link
Nationwide Action Workshop: Contact Congress about AI safety!
Felix De Simone
Feb 24, 2025, 7:36 PM
7
points
0
comments
1
min read
LW
link
Anthropic releases Claude 3.7 Sonnet with extended thinking mode
LawrenceC
Feb 24, 2025, 7:32 PM
88
points
8
comments
4
min read
LW
link
(www.anthropic.com)
Training AI to do alignment research we don’t already know how to do
joshc
Feb 24, 2025, 7:19 PM
45
points
23
comments
7
min read
LW
link
Conference Report: Threshold 2030 - Modeling AI Economic Futures
Deric Cheng
,
Justin Bullock
,
Deger Turan
and
Elliot Mckernon
Feb 24, 2025, 6:56 PM
51
points
0
comments
10
min read
LW
link
(www.convergenceanalysis.org)
Evaluating “What 2026 Looks Like” So Far
Jonny Spicer
Feb 24, 2025, 6:55 PM
77
points
5
comments
7
min read
LW
link
Superintelligent Agents Pose Catastrophic Risks: Can Scientist AI Offer a Safer Path?
Yoshua Bengio
,
Jesse Richardson
,
dwk
and
mattmacdermott
Feb 24, 2025, 6:31 PM
44
points
15
comments
11
min read
LW
link
Understanding Agent Preferences
martinkunev
Feb 24, 2025, 5:46 PM
6
points
2
comments
14
min read
LW
link
What We Can Do to Prevent Extinction by AI
Joe Rogero
Feb 24, 2025, 5:15 PM
12
points
0
comments
LW
link
Dream, Truth, & Good
abramdemski
Feb 24, 2025, 4:59 PM
50
points
11
comments
4
min read
LW
link
Forecasting Frontier Language Model Agent Capabilities
Govind Pimpale
,
Axel Højmark
,
Jérémy Scheurer
and
Marius Hobbhahn
Feb 24, 2025, 4:51 PM
35
points
0
comments
5
min read
LW
link
(www.apolloresearch.ai)
A City Within a City
Declan Molony
Feb 24, 2025, 3:51 PM
48
points
1
comment
7
min read
LW
link
Grok Grok
Zvi
Feb 24, 2025, 2:20 PM
36
points
2
comments
19
min read
LW
link
(thezvi.wordpress.com)
if you’re not happy single, you won’t be happy immortal
daijin
Feb 24, 2025, 1:23 PM
2
points
1
comment
1
min read
LW
link
[NSFW] The Fuzzy Handcuffs of Liberation
lsusr
Feb 24, 2025, 1:05 PM
27
points
11
comments
2
min read
LW
link
Dayton, Ohio, HPMOR 10 year Anniversary meetup
Lunawarrior
Feb 24, 2025, 12:55 PM
1
point
0
comments
1
min read
LW
link
An Alternate History of the Future, 2025-2040
Mr Beastly
Feb 24, 2025, 5:53 AM
3
points
5
comments
10
min read
LW
link
Export Surplusses
lsusr
Feb 24, 2025, 5:53 AM
24
points
21
comments
3
min read
LW
link
AI alignment for mental health supports
hiki_t
Feb 24, 2025, 4:21 AM
1
point
1
comment
1
min read
LW
link
The GDM AGI Safety+Alignment Team is Hiring for Applied Interpretability Research
Arthur Conmy
and
Neel Nanda
Feb 24, 2025, 2:17 AM
48
points
1
comment
7
min read
LW
link
Poll on AI opinions.
Niclas Kupper
Feb 23, 2025, 10:39 PM
1
point
2
comments
1
min read
LW
link
The Geometry of Linear Regression versus PCA
criticalpoints
Feb 23, 2025, 9:01 PM
20
points
7
comments
6
min read
LW
link
(eregis.github.io)
Judgements: Merging Prediction & Evidence
abramdemski
Feb 23, 2025, 7:35 PM
103
points
5
comments
6
min read
LW
link
Intelligence as Privilege Escalation
Cole Wyeth
Feb 23, 2025, 7:31 PM
28
points
0
comments
5
min read
LW
link
[Question]
Have LLMs Generated Novel Insights?
abramdemski
and
Cole Wyeth
23 Feb 2025 18:22 UTC
159
points
41
comments
2
min read
LW
link
The case for corporal punishment
Yair Halberstadt
23 Feb 2025 15:05 UTC
27
points
4
comments
2
min read
LW
link
Reflections on the state of the race to superintelligence, February 2025
Mitchell_Porter
23 Feb 2025 13:58 UTC
21
points
7
comments
4
min read
LW
link
List of most interesting ideas I encountered in my life, ranked
Lucien
23 Feb 2025 12:36 UTC
21
points
6
comments
1
min read
LW
link
Test of the Bene Gesserit
lsusr
23 Feb 2025 11:51 UTC
19
points
3
comments
3
min read
LW
link
Moral gauge theory: A speculative suggestion for AI alignment
James Diacoumis
23 Feb 2025 11:42 UTC
6
points
2
comments
8
min read
LW
link
[Question]
Does human (mis)alignment pose a significant and imminent existential threat?
jr
23 Feb 2025 10:03 UTC
6
points
3
comments
1
min read
LW
link
Deep sparse autoencoders yield interpretable features too
Armaan A. Abraham
23 Feb 2025 5:46 UTC
29
points
8
comments
8
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel