Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible
GeneSmith
and
kman
Dec 12, 2023, 6:14 PM
458
points
206
comments
33
min read
LW
link
2
reviews
Speaking to Congressional staffers about AI risk
Orpheus16
and
hath
Dec 4, 2023, 11:08 PM
312
points
25
comments
15
min read
LW
link
1
review
Constellations are Younger than Continents
Jeffrey Heninger
Dec 19, 2023, 6:12 AM
263
points
21
comments
2
min read
LW
link
AI Control: Improving Safety Despite Intentional Subversion
Buck
,
Fabien Roger
,
ryan_greenblatt
and
Kshitij Sachan
Dec 13, 2023, 3:51 PM
236
points
24
comments
10
min read
LW
link
4
reviews
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes
Dec 1, 2023, 5:30 PM
197
points
63
comments
14
min read
LW
link
1
review
Is being sexy for your homies?
Valentine
Dec 13, 2023, 8:37 PM
194
points
100
comments
14
min read
LW
link
2
reviews
“Humanity vs. AGI” Will Never Look Like “Humanity vs. AGI” to Humanity
Thane Ruthenis
Dec 16, 2023, 8:08 PM
191
points
34
comments
5
min read
LW
link
Effective Aspersions: How the Nonlinear Investigation Went Wrong
TracingWoodgrains
Dec 19, 2023, 12:00 PM
188
points
172
comments
LW
link
2
reviews
re: Yudkowsky on biological materials
bhauth
Dec 11, 2023, 1:28 PM
182
points
30
comments
5
min read
LW
link
The ‘Neglected Approaches’ Approach: AE Studio’s Alignment Agenda
Cameron Berg
,
Judd Rosenblatt
,
AE Studio
and
Marc Carauleanu
Dec 18, 2023, 8:35 PM
177
points
23
comments
12
min read
LW
link
1
review
Critical review of Christiano’s disagreements with Yudkowsky
Vanessa Kosoy
Dec 27, 2023, 4:02 PM
176
points
40
comments
15
min read
LW
link
2023 Unofficial LessWrong Census/Survey
Screwtape
Dec 2, 2023, 4:41 AM
169
points
81
comments
1
min read
LW
link
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
and
habryka
Dec 1, 2023, 2:54 AM
167
points
54
comments
25
min read
LW
link
The likely first longevity drug is based on sketchy science. This is bad for science and bad for longevity.
BobBurgers
Dec 12, 2023, 2:42 AM
161
points
34
comments
5
min read
LW
link
Most People Don’t Realize We Have No Idea How Our AIs Work
Thane Ruthenis
Dec 21, 2023, 8:02 PM
159
points
42
comments
1
min read
LW
link
Succession
Richard_Ngo
Dec 20, 2023, 7:25 PM
159
points
48
comments
11
min read
LW
link
(www.narrativeark.xyz)
The Plan − 2023 Version
johnswentworth
Dec 29, 2023, 11:34 PM
152
points
40
comments
31
min read
LW
link
1
review
Discussion: Challenges with Unsupervised LLM Knowledge Discovery
Seb Farquhar
,
Vikrant Varma
,
zac_kenton
,
gasteigerjo
,
Vlad Mikulik
and
Rohin Shah
Dec 18, 2023, 11:58 AM
147
points
21
comments
10
min read
LW
link
AI Views Snapshots
Rob Bensinger
Dec 13, 2023, 12:45 AM
142
points
61
comments
1
min read
LW
link
The Dark Arts
lsusr
and
Lyrongolem
Dec 19, 2023, 4:41 AM
134
points
49
comments
9
min read
LW
link
Current AIs Provide Nearly No Data Relevant to AGI Alignment
Thane Ruthenis
Dec 15, 2023, 8:16 PM
132
points
157
comments
8
min read
LW
link
1
review
Natural Latents: The Math
johnswentworth
and
David Lorell
Dec 27, 2023, 7:03 PM
129
points
41
comments
12
min read
LW
link
2
reviews
Deep Forgetting & Unlearning for Safely-Scoped LLMs
scasper
Dec 5, 2023, 4:48 PM
126
points
30
comments
13
min read
LW
link
Bayesian Injustice
Kevin Dorst
Dec 14, 2023, 3:44 PM
124
points
10
comments
6
min read
LW
link
(kevindorst.substack.com)
The LessWrong 2022 Review
habryka
Dec 5, 2023, 4:00 AM
115
points
43
comments
4
min read
LW
link
Mapping the semantic void: Strange goings-on in GPT embedding spaces
mwatkins
Dec 14, 2023, 1:10 PM
114
points
31
comments
14
min read
LW
link
What I Would Do If I Were Working On AI Governance
johnswentworth
Dec 8, 2023, 6:43 AM
110
points
32
comments
10
min read
LW
link
“AI Alignment” is a Dangerously Overloaded Term
Roko
Dec 15, 2023, 2:34 PM
108
points
100
comments
3
min read
LW
link
[Question]
How do you feel about LessWrong these days? [Open feedback thread]
Bird Concept
Dec 5, 2023, 8:54 PM
108
points
285
comments
1
min read
LW
link
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:44 AM
106
points
10
comments
22
min read
LW
link
2
reviews
On the future of language models
owencb
Dec 20, 2023, 4:58 PM
105
points
17
comments
LW
link
The Witness
Richard_Ngo
Dec 3, 2023, 10:27 PM
105
points
5
comments
14
min read
LW
link
(www.narrativeark.xyz)
Nonlinear’s Evidence: Debunking False and Misleading Claims
KatWoods
Dec 12, 2023, 1:16 PM
104
points
171
comments
LW
link
[Valence series] 1. Introduction
Steven Byrnes
Dec 4, 2023, 3:40 PM
99
points
16
comments
16
min read
LW
link
2
reviews
Nietzsche’s Morality in Plain English
Arjun Panickssery
Dec 4, 2023, 12:57 AM
92
points
14
comments
4
min read
LW
link
1
review
(arjunpanickssery.substack.com)
A Crisper Explanation of Simulacrum Levels
Thane Ruthenis
Dec 23, 2023, 10:13 PM
92
points
13
comments
13
min read
LW
link
Meaning & Agency
abramdemski
Dec 19, 2023, 10:27 PM
91
points
17
comments
14
min read
LW
link
Prediction Markets aren’t Magic
SimonM
Dec 21, 2023, 12:54 PM
90
points
29
comments
3
min read
LW
link
Based Beff Jezos and the Accelerationists
Zvi
Dec 6, 2023, 4:00 PM
90
points
29
comments
12
min read
LW
link
(thezvi.wordpress.com)
[Valence series] 2. Valence & Normativity
Steven Byrnes
Dec 7, 2023, 4:43 PM
88
points
7
comments
28
min read
LW
link
1
review
Some for-profit AI alignment org ideas
Eric Ho
Dec 14, 2023, 2:23 PM
87
points
19
comments
9
min read
LW
link
A Universal Emergent Decomposition of Retrieval Tasks in Language Models
Alexandre Variengien
and
Eric Winsor
Dec 19, 2023, 11:52 AM
84
points
3
comments
10
min read
LW
link
(arxiv.org)
Refusal mechanisms: initial experiments with Llama-2-7b-chat
Andy Arditi
and
Oscar Obeso
Dec 8, 2023, 5:08 PM
82
points
7
comments
7
min read
LW
link
Studying The Alien Mind
Quentin FEUILLADE--MONTIXI
and
NicholasKees
Dec 5, 2023, 5:27 PM
80
points
10
comments
15
min read
LW
link
EU policymakers reach an agreement on the AI Act
tlevin
Dec 15, 2023, 6:02 AM
78
points
7
comments
7
min read
LW
link
MATS Summer 2023 Retrospective
utilistrutil
,
Juan Gil
,
Ryan Kidd
,
Christian Smith
,
McKennaFitzgerald
and
LauraVaughan
Dec 1, 2023, 11:29 PM
77
points
34
comments
26
min read
LW
link
OpenAI: Leaks Confirm the Story
Zvi
Dec 12, 2023, 2:00 PM
77
points
9
comments
16
min read
LW
link
(thezvi.wordpress.com)
[Valence series] 3. Valence & Beliefs
Steven Byrnes
11 Dec 2023 20:21 UTC
77
points
12
comments
21
min read
LW
link
1
review
Send us example gnarly bugs
Beth Barnes
,
Megan Kinniment
and
Tao Lin
10 Dec 2023 5:23 UTC
77
points
10
comments
2
min read
LW
link
The Offense-Defense Balance Rarely Changes
Maxwell Tabarrok
9 Dec 2023 15:21 UTC
77
points
23
comments
3
min read
LW
link
(maximumprogress.substack.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel