Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
2
Human-AI Relationality is Already Here
bridgebot
Feb 20, 2025, 7:08 AM
17
points
0
comments
15
min read
LW
link
Safe Distillation With a Powerful Untrusted AI
Alek Westover
Feb 20, 2025, 3:14 AM
5
points
1
comment
5
min read
LW
link
Modularity and assembly: AI safety via thinking smaller
D Wong
Feb 20, 2025, 12:58 AM
2
points
0
comments
11
min read
LW
link
(criticalreason.substack.com)
Eliezer’s Lost Alignment Articles / The Arbital Sequence
Ruby
and
RobertM
Feb 20, 2025, 12:48 AM
207
points
10
comments
5
min read
LW
link
Arbital has been imported to LessWrong
RobertM
,
jimrandomh
,
Ben Pace
and
Ruby
Feb 20, 2025, 12:47 AM
281
points
30
comments
5
min read
LW
link
The Dilemma’s Dilemma
James Stephen Brown
Feb 19, 2025, 11:50 PM
9
points
12
comments
7
min read
LW
link
(nonzerosum.games)
[Question]
Why do we have the NATO logo?
KvmanThinking
Feb 19, 2025, 10:59 PM
1
point
4
comments
1
min read
LW
link
Metaculus Q4 AI Benchmarking: Bots Are Closing The Gap
Molly
and
Tom Liptay
Feb 19, 2025, 10:42 PM
13
points
0
comments
13
min read
LW
link
(www.metaculus.com)
Several Arguments Against the Mathematical Universe Hypothesis
Vittu Perkele
Feb 19, 2025, 10:13 PM
−4
points
6
comments
3
min read
LW
link
(open.substack.com)
Literature Review of Text AutoEncoders
NickyP
Feb 19, 2025, 9:54 PM
20
points
5
comments
8
min read
LW
link
DeepSeek Made it Even Harder for US AI Companies to Ever Reach Profitability
garrison
Feb 19, 2025, 9:02 PM
10
points
1
comment
3
min read
LW
link
(garrisonlovely.substack.com)
Won’t vs. Can’t: Sandbagging-like Behavior from Claude Models
Joe Benton
and
Zachary Witten
Feb 19, 2025, 8:47 PM
15
points
1
comment
1
min read
LW
link
(alignment.anthropic.com)
AI Alignment and the Financial War Against Narcissistic Manipulation
henophilia
Feb 19, 2025, 8:42 PM
−17
points
2
comments
3
min read
LW
link
How to Make Superbabies
GeneSmith
and
kman
Feb 19, 2025, 8:39 PM
617
points
355
comments
31
min read
LW
link
The Newbie’s Guide to Navigating AI Futures
keithjmenezes
Feb 19, 2025, 8:37 PM
−1
points
0
comments
40
min read
LW
link
Against Unlimited Genius for Baby-Killers
ggggg
Feb 19, 2025, 8:33 PM
−7
points
1
comment
3
min read
LW
link
(ggggggggggggggggggggggg.substack.com)
New LLM Scaling Law
wrmedford
Feb 19, 2025, 8:21 PM
2
points
0
comments
1
min read
LW
link
(github.com)
Go Grok Yourself
Zvi
Feb 19, 2025, 8:20 PM
57
points
2
comments
17
min read
LW
link
(thezvi.wordpress.com)
[Question]
Take over my project: do computable agents plan against the universal distribution pessimistically?
Cole Wyeth
Feb 19, 2025, 8:17 PM
25
points
3
comments
3
min read
LW
link
When should we worry about AI power-seeking?
Joe Carlsmith
Feb 19, 2025, 7:44 PM
22
points
0
comments
18
min read
LW
link
(joecarlsmith.substack.com)
SuperBabies podcast with Gene Smith
Eneasz
Feb 19, 2025, 7:36 PM
35
points
1
comment
1
min read
LW
link
(thebayesianconspiracy.substack.com)
Undesirable Conclusions and Origin Adjustment
Jerdle
Feb 19, 2025, 6:35 PM
3
points
0
comments
5
min read
LW
link
How might we safely pass the buck to AI?
joshc
Feb 19, 2025, 5:48 PM
83
points
58
comments
31
min read
LW
link
Using Prompt Evaluation to Combat Bio-Weapon Research
Stuart_Armstrong
and
rgorman
Feb 19, 2025, 12:39 PM
11
points
2
comments
3
min read
LW
link
Intelligence Is Jagged
Adam Train
Feb 19, 2025, 7:08 AM
6
points
1
comment
3
min read
LW
link
Closed-ended questions aren’t as hard as you think
electroswing
Feb 19, 2025, 3:53 AM
6
points
0
comments
3
min read
LW
link
Undergrad AI Safety Conference
JoNeedsSleep
Feb 19, 2025, 3:43 AM
19
points
0
comments
1
min read
LW
link
Permanent properties of things are a self-fulfilling prophecy
YanLyutnev
Feb 19, 2025, 12:08 AM
4
points
0
comments
9
min read
LW
link
Places of Loving Grace [Story]
ank
Feb 18, 2025, 11:49 PM
−1
points
0
comments
4
min read
LW
link
Are SAE features from the Base Model still meaningful to LLaVA?
Shan23Chen
Feb 18, 2025, 10:16 PM
8
points
2
comments
10
min read
LW
link
(www.lesswrong.com)
Sparse Autoencoder Features for Classifications and Transferability
Shan23Chen
Feb 18, 2025, 10:14 PM
5
points
0
comments
1
min read
LW
link
(arxiv.org)
A fable on AI x-risk
bgaesop
Feb 18, 2025, 8:15 PM
8
points
4
comments
1
min read
LW
link
The Unearned Privilege We Rarely Discuss: Cognitive Capability
DiegoRojas
Feb 18, 2025, 8:06 PM
−21
points
7
comments
3
min read
LW
link
Call for Applications: XLab Summer Research Fellowship
JoNeedsSleep
Feb 18, 2025, 7:19 PM
9
points
0
comments
1
min read
LW
link
AISN #48: Utility Engineering and EnigmaEval
Corin Katzke
and
Dan H
Feb 18, 2025, 7:15 PM
4
points
0
comments
4
min read
LW
link
(newsletter.safe.ai)
Abstract Mathematical Concepts vs. Abstractions Over Real-World Systems
Thane Ruthenis
Feb 18, 2025, 6:04 PM
32
points
10
comments
4
min read
LW
link
How accurate was my “Altered Traits” book review?
lsusr
Feb 18, 2025, 5:00 PM
43
points
3
comments
3
min read
LW
link
Medical Roundup #4
Zvi
Feb 18, 2025, 1:40 PM
24
points
3
comments
10
min read
LW
link
(thezvi.wordpress.com)
Dear AGI,
Nathan Young
Feb 18, 2025, 10:48 AM
88
points
11
comments
3
min read
LW
link
There are a lot of upcoming retreats/conferences between March and July (2025)
gergogaspar
and
ENAIS
Feb 18, 2025, 9:30 AM
6
points
0
comments
1
min read
LW
link
Sea Change
Charlie Sanders
Feb 18, 2025, 6:03 AM
−2
points
2
comments
5
min read
LW
link
(www.dailymicrofiction.com)
Born on Third Base: The Case for Inheriting Nothing and Building Everything
charlieoneill
Feb 18, 2025, 12:47 AM
−24
points
16
comments
2
min read
LW
link
Do models know when they are being evaluated?
Govind Pimpale
,
Giles
,
Joe Needham
and
Marius Hobbhahn
Feb 17, 2025, 11:13 PM
59
points
8
comments
12
min read
LW
link
AGI Safety & Alignment @ Google DeepMind is hiring
Rohin Shah
Feb 17, 2025, 9:11 PM
102
points
19
comments
10
min read
LW
link
The Peeperi (unfinished) - By Katja Grace
Nathan Young
Feb 17, 2025, 7:33 PM
22
points
0
comments
3
min read
LW
link
(docs.google.com)
Progress links and short notes, 2025-02-17
jasoncrawford
Feb 17, 2025, 7:18 PM
8
points
0
comments
7
min read
LW
link
(newsletter.rootsofprogress.org)
Claude 3.5 Sonnet (New)’s AGI scenario
Nathan Young
Feb 17, 2025, 6:47 PM
5
points
2
comments
5
min read
LW
link
Talking to laymen about AI development
David Steel
Feb 17, 2025, 6:42 PM
8
points
0
comments
1
min read
LW
link
On the Rebirth of Aristocracy in the American Regime
shawkisukkar
Feb 17, 2025, 4:18 PM
−16
points
3
comments
9
min read
LW
link
(shawkisukkar.substack.com)
Ascetic hedonism
dkl9
Feb 17, 2025, 3:56 PM
15
points
9
comments
2
min read
LW
link
(dkl9.net)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel