Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Responses to apparent rationalist confusions about game / decision theory
Anthony DiGiovanni
Aug 30, 2023, 10:02 PM
142
points
20
comments
12
min read
LW
link
1
review
Invulnerable Incomplete Preferences: A Formal Statement
SCP
Aug 30, 2023, 9:59 PM
134
points
39
comments
35
min read
LW
link
Report on Frontier Model Training
YafahEdelman
Aug 30, 2023, 8:02 PM
122
points
21
comments
21
min read
LW
link
(docs.google.com)
An adversarial example for Direct Logit Attribution: memory management in gelu-4l
Can
,
Yeu-Tong Lau
,
James Dao
and
Jett Janiak
Aug 30, 2023, 5:36 PM
17
points
0
comments
8
min read
LW
link
(arxiv.org)
A Letter to the Editor of MIT Technology Review
Jeffs
Aug 30, 2023, 4:59 PM
0
points
0
comments
2
min read
LW
link
Biosecurity Culture, Computer Security Culture
jefftk
Aug 30, 2023, 4:40 PM
103
points
11
comments
2
min read
LW
link
(www.jefftk.com)
Why I hang out at LessWrong and why you should check-in there every now and then
Bill Benzon
Aug 30, 2023, 3:20 PM
16
points
5
comments
5
min read
LW
link
“Wanting” and “liking”
Mateusz Bagiński
Aug 30, 2023, 2:52 PM
23
points
3
comments
29
min read
LW
link
Open Call for Research Assistants in Developmental Interpretability
Jesse Hoogland
,
Daniel Murfet
,
Alexander Gietelink Oldenziel
and
Stan van Wingerden
Aug 30, 2023, 9:02 AM
56
points
11
comments
4
min read
LW
link
LTFF and EAIF are unusually funding-constrained right now
Linch
and
calebp99
Aug 30, 2023, 1:03 AM
90
points
24
comments
15
min read
LW
link
(forum.effectivealtruism.org)
Paper Walkthrough: Automated Circuit Discovery with Arthur Conmy
Neel Nanda
Aug 29, 2023, 10:07 PM
36
points
1
comment
1
min read
LW
link
(www.youtube.com)
An OV-Coherent Toy Model of Attention Head Superposition
Lauren Greenspan
and
keith_wynroe
Aug 29, 2023, 7:44 PM
26
points
2
comments
6
min read
LW
link
The Economics of the Asteroid Deflection Problem (Dominant Assurance Contracts)
moyamo
Aug 29, 2023, 6:28 PM
78
points
71
comments
15
min read
LW
link
Democratic Fine-Tuning
Joe Edelman
Aug 29, 2023, 6:13 PM
22
points
2
comments
1
min read
LW
link
(open.substack.com)
Should rationalists (be seen to) win?
Will_Pearson
Aug 29, 2023, 6:13 PM
6
points
7
comments
1
min read
LW
link
Frankfurt meetup
sultan
Aug 29, 2023, 6:10 PM
2
points
0
comments
1
min read
LW
link
Istanbul meetup
sultan
Aug 29, 2023, 6:10 PM
2
points
0
comments
1
min read
LW
link
Broken Benchmark: MMLU
awg
Aug 29, 2023, 6:09 PM
24
points
5
comments
1
min read
LW
link
(www.youtube.com)
AISN #20: LLM Proliferation, AI Deception, and Continuing Drivers of AI Capabilities
Dan H
Aug 29, 2023, 3:07 PM
12
points
0
comments
8
min read
LW
link
(newsletter.safe.ai)
Loft Bed Fan Guard
jefftk
Aug 29, 2023, 1:30 PM
16
points
3
comments
1
min read
LW
link
(www.jefftk.com)
Dating Roundup #1: This is Why You’re Single
Zvi
Aug 29, 2023, 12:50 PM
87
points
28
comments
38
min read
LW
link
(thezvi.wordpress.com)
Neural Recognizers: Some [old] notes based on a TV tube metaphor [perceptual contact with the world]
Bill Benzon
Aug 29, 2023, 11:33 AM
4
points
0
comments
5
min read
LW
link
Barriers to Mechanistic Interpretability for AGI Safety
Connor Leahy
Aug 29, 2023, 10:56 AM
63
points
13
comments
1
min read
LW
link
(www.youtube.com)
Newcomb Variant
lsusr
Aug 29, 2023, 7:02 AM
25
points
23
comments
1
min read
LW
link
[Question]
Incentives affecting alignment-researcher encouragement
Nicholas / Heather Kross
Aug 29, 2023, 5:11 AM
28
points
3
comments
1
min read
LW
link
Anyone want to debate publicly about FDT?
Bentham's Bulldog
Aug 29, 2023, 3:45 AM
13
points
31
comments
1
min read
LW
link
AI Deception: A Survey of Examples, Risks, and Potential Solutions
Simon Goldstein
and
Peter S. Park
Aug 29, 2023, 1:29 AM
54
points
3
comments
10
min read
LW
link
An Interpretability Illusion for Activation Patching of Arbitrary Subspaces
Georg Lange
,
Alex Makelov
and
Neel Nanda
Aug 29, 2023, 1:04 AM
77
points
4
comments
1
min read
LW
link
OpenAI API base models are not sycophantic, at any size
nostalgebraist
Aug 29, 2023, 12:58 AM
183
points
20
comments
2
min read
LW
link
(colab.research.google.com)
Paradigms and Theory Choice in AI: Adaptivity, Economy and Control
particlemania
Aug 28, 2023, 10:19 PM
4
points
0
comments
16
min read
LW
link
[Question]
Humanities In A Post-Conscious AI World?
Netcentrica
Aug 28, 2023, 9:59 PM
1
point
1
comment
2
min read
LW
link
Introducing the Center for AI Policy (& we’re hiring!)
Thomas Larsen
Aug 28, 2023, 9:17 PM
123
points
50
comments
2
min read
LW
link
(www.aipolicy.us)
[Question]
45% to 55% vs. 90% to 100%
yhoiseth
Aug 28, 2023, 7:15 PM
5
points
8
comments
4
min read
LW
link
Information warfare historically revolved around human conduits
trevor
Aug 28, 2023, 6:54 PM
37
points
7
comments
3
min read
LW
link
The Evidence for Question Decomposition is Weak
niplav
Aug 28, 2023, 3:46 PM
22
points
6
comments
5
min read
LW
link
ACX Meetup Anywhere, Bratislava, Slovakia
David Varga
Aug 28, 2023, 3:40 PM
1
point
0
comments
1
min read
LW
link
The Anthropic Principle Tells Us That AGI Will Not Be Conscious
nem
Aug 28, 2023, 3:25 PM
2
points
8
comments
1
min read
LW
link
No More Freezer Pucks
jefftk
Aug 28, 2023, 3:20 PM
10
points
7
comments
1
min read
LW
link
(www.jefftk.com)
The mind as a polyviscous fluid
Bill Benzon
Aug 28, 2023, 2:38 PM
8
points
0
comments
3
min read
LW
link
[Question]
Who can most reduce X-Risk?
sudhanshu_kasewa
Aug 28, 2023, 2:38 PM
1
point
12
comments
1
min read
LW
link
Drinks at a bar
yakimoff
Aug 28, 2023, 3:13 AM
2
points
0
comments
1
min read
LW
link
Dear Self; we need to talk about ambition
Elizabeth
Aug 27, 2023, 11:10 PM
270
points
28
comments
8
min read
LW
link
2
reviews
(acesounderglass.com)
AI pause/governance advocacy might be net-negative, especially without a focus on explaining x-risk
Mikhail Samin
Aug 27, 2023, 11:05 PM
72
points
9
comments
6
min read
LW
link
Will issues are quite nearly skill issues
dkl9
Aug 27, 2023, 4:42 PM
1
point
1
comment
3
min read
LW
link
(dkl9.net)
Xanadu, GPT, and Beyond: An adventure of the mind
Bill Benzon
Aug 27, 2023, 4:19 PM
2
points
0
comments
5
min read
LW
link
High level overview on how to go about estimating “p(doom)” or the like
Aryeh Englander
Aug 27, 2023, 4:01 PM
16
points
0
comments
5
min read
LW
link
Trying a Wet Suit
jefftk
Aug 27, 2023, 3:00 PM
33
points
5
comments
1
min read
LW
link
(www.jefftk.com)
Apply to a small iteration of MLAB in Oxford
RP
,
MariaK
and
OliverHayman
Aug 27, 2023, 2:54 PM
2
points
0
comments
1
min read
LW
link
Apply to a small iteration of MLAB to be run in Oxford
RP
,
MariaK
and
OliverHayman
Aug 27, 2023, 2:21 PM
12
points
0
comments
1
min read
LW
link
The Game of Dominance
Karl von Wendt
Aug 27, 2023, 11:04 AM
24
points
15
comments
6
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel