Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Douglas Hofstadter changes his mind on Deep Learning & AI risk (June 2023)?
gwern
Jul 3, 2023, 12:48 AM
426
points
54
comments
7
min read
LW
link
(www.youtube.com)
Alignment Grantmaking is Funding-Limited Right Now
johnswentworth
Jul 19, 2023, 4:49 PM
312
points
68
comments
1
min read
LW
link
Accidentally Load Bearing
jefftk
Jul 13, 2023, 4:10 PM
287
points
18
comments
1
min read
LW
link
1
review
(www.jefftk.com)
Yes, It’s Subjective, But Why All The Crabs?
johnswentworth
Jul 28, 2023, 7:35 PM
250
points
15
comments
6
min read
LW
link
Cultivating a state of mind where new ideas are born
Henrik Karlsson
Jul 27, 2023, 9:16 AM
244
points
21
comments
14
min read
LW
link
2
reviews
(www.henrikkarlsson.xyz)
Self-driving car bets
paulfchristiano
Jul 29, 2023, 6:10 PM
236
points
44
comments
5
min read
LW
link
(sideways-view.com)
Ways I Expect AI Regulation To Increase Extinction Risk
1a3orn
Jul 4, 2023, 5:32 PM
226
points
32
comments
7
min read
LW
link
Consciousness as a conflationary alliance term for intrinsically valued internal experiences
Andrew_Critch
Jul 10, 2023, 8:09 AM
214
points
54
comments
11
min read
LW
link
2
reviews
My “2.9 trauma limit”
Raemon
Jul 1, 2023, 7:32 PM
196
points
31
comments
7
min read
LW
link
Towards Developmental Interpretability
Jesse Hoogland
,
Alexander Gietelink Oldenziel
,
Daniel Murfet
and
Stan van Wingerden
Jul 12, 2023, 7:33 PM
192
points
10
comments
9
min read
LW
link
1
review
Grant applications and grand narratives
Elizabeth
Jul 2, 2023, 12:16 AM
191
points
22
comments
6
min read
LW
link
Cryonics and Regret
MvB
Jul 24, 2023, 9:16 AM
190
points
35
comments
2
min read
LW
link
1
review
[Linkpost] Introducing Superalignment
beren
Jul 5, 2023, 6:23 PM
175
points
69
comments
1
min read
LW
link
(openai.com)
Rationality !== Winning
Raemon
Jul 24, 2023, 2:53 AM
170
points
51
comments
4
min read
LW
link
Why it’s so hard to talk about Consciousness
Rafael Harth
Jul 2, 2023, 3:56 PM
167
points
215
comments
9
min read
LW
link
3
reviews
When can we trust model evaluations?
evhub
Jul 28, 2023, 7:42 PM
166
points
10
comments
10
min read
LW
link
1
review
Jailbreaking GPT-4′s code interpreter
Nikola Jurkovic
Jul 13, 2023, 6:43 PM
160
points
22
comments
7
min read
LW
link
OpenAI Launches Superalignment Taskforce
Zvi
Jul 11, 2023, 1:00 PM
150
points
40
comments
49
min read
LW
link
(thezvi.wordpress.com)
Brain Efficiency Cannell Prize Contest Award Ceremony
Alexander Gietelink Oldenziel
Jul 24, 2023, 11:30 AM
149
points
12
comments
7
min read
LW
link
The Goddess of Everything Else—The Animation
Writer
Jul 13, 2023, 4:26 PM
142
points
4
comments
1
min read
LW
link
(youtu.be)
The Seeker’s Game – Vignettes from the Bay
Yulia
Jul 9, 2023, 7:32 PM
141
points
19
comments
16
min read
LW
link
Going Crazy and Getting Better Again
Evenstar
Jul 2, 2023, 6:55 PM
139
points
13
comments
7
min read
LW
link
1
review
Ten Levels of AI Alignment Difficulty
Sammy Martin
Jul 3, 2023, 8:20 PM
138
points
24
comments
12
min read
LW
link
1
review
Neuronpedia
Johnny Lin
Jul 26, 2023, 4:29 PM
135
points
51
comments
2
min read
LW
link
(neuronpedia.org)
How LLMs are and are not myopic
janus
Jul 25, 2023, 2:19 AM
135
points
16
comments
8
min read
LW
link
Views on when AGI comes and on strategy to reduce existential risk
TsviBT
Jul 8, 2023, 9:00 AM
133
points
61
comments
14
min read
LW
link
1
review
Introducing Fatebook: the fastest way to make and track predictions
Adam B
and
Sage Future
Jul 11, 2023, 3:28 PM
132
points
41
comments
1
min read
LW
link
2
reviews
(fatebook.io)
Even Superhuman Go AIs Have Surprising Failure Modes
AdamGleave
,
EuanMcLean
,
Tony Wang
,
Kellin Pelrine
,
Tom Tseng
,
Yawen Duan
,
Joseph Miller
and
MichaelDennis
Jul 20, 2023, 5:31 PM
130
points
22
comments
10
min read
LW
link
(far.ai)
Reducing sycophancy and improving honesty via activation steering
Nina Panickssery
Jul 28, 2023, 2:46 AM
122
points
18
comments
9
min read
LW
link
1
review
Why was the AI Alignment community so unprepared for this moment?
Ras1513
Jul 15, 2023, 12:26 AM
121
points
65
comments
2
min read
LW
link
“Reframing Superintelligence” + LLMs + 4 years
Eric Drexler
Jul 10, 2023, 1:42 PM
118
points
9
comments
12
min read
LW
link
Winners of AI Alignment Awards Research Contest
Orpheus16
and
OliviaJ
Jul 13, 2023, 4:14 PM
115
points
4
comments
12
min read
LW
link
(alignmentawards.com)
Introducing bayescalc.io
Adele Lopez
Jul 7, 2023, 4:11 PM
115
points
29
comments
1
min read
LW
link
(bayescalc.io)
QAPR 5: grokking is maybe not *that* big a deal?
Quintin Pope
Jul 23, 2023, 8:14 PM
114
points
15
comments
9
min read
LW
link
Measuring and Improving the Faithfulness of Model-Generated Reasoning
Ansh Radhakrishnan
,
tamera
,
karinanguyen
,
Sam Bowman
and
Ethan Perez
Jul 18, 2023, 4:36 PM
111
points
15
comments
6
min read
LW
link
1
review
Priorities for the UK Foundation Models Taskforce
Andrea_Miotti
Jul 21, 2023, 3:23 PM
105
points
4
comments
5
min read
LW
link
(www.conjecture.dev)
Consider Joining the UK Foundation Model Taskforce
Zvi
Jul 10, 2023, 1:50 PM
105
points
12
comments
1
min read
LW
link
(thezvi.wordpress.com)
A transcript of the TED talk by Eliezer Yudkowsky
Mikhail Samin
Jul 12, 2023, 12:12 PM
105
points
13
comments
4
min read
LW
link
Anthropic Observations
Zvi
Jul 25, 2023, 12:50 PM
104
points
1
comment
10
min read
LW
link
(thezvi.wordpress.com)
Fixed Point: a love story
Richard_Ngo
Jul 8, 2023, 1:56 PM
99
points
2
comments
7
min read
LW
link
Meta-level adversarial evaluation of oversight techniques might allow robust measurement of their adequacy
Buck
and
ryan_greenblatt
Jul 26, 2023, 5:02 PM
99
points
19
comments
1
min read
LW
link
1
review
When Someone Tells You They’re Lying, Believe Them
ymeskhout
Jul 14, 2023, 12:31 AM
95
points
3
comments
3
min read
LW
link
“Justice, Cherryl.”
Zack_M_Davis
Jul 23, 2023, 4:16 PM
91
points
21
comments
9
min read
LW
link
1
review
BCIs and the ecosystem of modular minds
beren
Jul 21, 2023, 3:58 PM
88
points
14
comments
11
min read
LW
link
Apollo Neuro Results
Elizabeth
Jul 30, 2023, 6:40 PM
85
points
17
comments
3
min read
LW
link
(acesounderglass.com)
[Question]
What Does LessWrong/EA Think of Human Intelligence Augmentation as of mid-2023?
lukemarks
Jul 8, 2023, 11:42 AM
84
points
28
comments
2
min read
LW
link
Underwater Torture Chambers: The Horror Of Fish Farming
omnizoid
Jul 26, 2023, 12:27 AM
83
points
50
comments
10
min read
LW
link
1
review
Sapient Algorithms
Valentine
Jul 17, 2023, 4:30 PM
82
points
15
comments
5
min read
LW
link
A $10k retroactive grant for VaccinateCA
Austin Chen
Jul 27, 2023, 6:14 PM
82
points
0
comments
LW
link
(manifund.org)
Compute Thresholds: proposed rules to mitigate risk of a “lab leak” accident during AI training runs
davidad
Jul 22, 2023, 6:09 PM
80
points
2
comments
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel