Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Question]
Anki setup best practices?
Sinclair Chen
Dec 25, 2023, 10:34 PM
11
points
4
comments
1
min read
LW
link
[Question]
Why does expected utility matter?
Marco Discendenti
Dec 25, 2023, 2:47 PM
18
points
21
comments
4
min read
LW
link
Freeze Dried Raspberry Truffles
jefftk
Dec 25, 2023, 2:10 PM
14
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Pornographic and semi-pornographic ads on mainstream websites as an instance of the AI alignment problem?
greenrd
Dec 25, 2023, 1:19 PM
−1
points
5
comments
12
min read
LW
link
Defense Against The Dark Arts: An Introduction
Lyrongolem
Dec 25, 2023, 6:36 AM
24
points
36
comments
20
min read
LW
link
Occlusions of Moral Knowledge
herschel
Dec 25, 2023, 5:55 AM
−1
points
0
comments
2
min read
LW
link
(brothernin.substack.com)
[Question]
Would you have a baby in 2024?
martinkunev
Dec 25, 2023, 1:52 AM
24
points
76
comments
1
min read
LW
link
align your latent spaces
bhauth
Dec 24, 2023, 4:30 PM
27
points
8
comments
2
min read
LW
link
(www.bhauth.com)
Viral Guessing Game
jefftk
Dec 24, 2023, 1:10 PM
19
points
0
comments
1
min read
LW
link
(www.jefftk.com)
The Sugar Alignment Problem
Adam Zerner
Dec 24, 2023, 1:35 AM
5
points
3
comments
7
min read
LW
link
A Crisper Explanation of Simulacrum Levels
Thane Ruthenis
Dec 23, 2023, 10:13 PM
92
points
13
comments
13
min read
LW
link
Hyperbolic Discounting and Pascal’s Mugging
Andrew Keenan Richardson
Dec 23, 2023, 9:55 PM
9
points
0
comments
7
min read
LW
link
AISN #28: Center for AI Safety 2023 Year in Review
Dan H
Dec 23, 2023, 9:31 PM
30
points
1
comment
5
min read
LW
link
(newsletter.safe.ai)
“Inftoxicity” and other new words to describe malicious information and communication thereof
Jáchym Fibír
Dec 23, 2023, 6:15 PM
−1
points
6
comments
3
min read
LW
link
AI’s impact on biology research: Part I, today
octopocta
Dec 23, 2023, 4:29 PM
31
points
6
comments
2
min read
LW
link
AI Girlfriends Won’t Matter Much
Maxwell Tabarrok
Dec 23, 2023, 3:58 PM
42
points
22
comments
2
min read
LW
link
(maximumprogress.substack.com)
The Next Right Token
jefftk
Dec 23, 2023, 3:20 AM
14
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
18
points
0
comments
4
min read
LW
link
Fact Finding: How to Think About Interpreting Memorisation (Post 4)
Senthooran Rajamanoharan
,
Neel Nanda
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
22
points
0
comments
9
min read
LW
link
Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
10
points
1
comment
16
min read
LW
link
Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan
,
Neel Nanda
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:45 AM
25
points
3
comments
14
min read
LW
link
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:44 AM
106
points
10
comments
22
min read
LW
link
2
reviews
Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt
,
Fabien Roger
and
Buck
Dec 23, 2023, 12:05 AM
57
points
10
comments
4
min read
LW
link
How does a toy 2 digit subtraction transformer predict the difference?
Evan Anders
Dec 22, 2023, 9:17 PM
12
points
0
comments
10
min read
LW
link
(evanhanders.blog)
Thoughts on Max Tegmark’s AI verification
Johannes C. Mayer
Dec 22, 2023, 8:38 PM
10
points
0
comments
3
min read
LW
link
Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis
Dec 22, 2023, 8:19 PM
75
points
14
comments
6
min read
LW
link
AI safety advocates should consider providing gentle pushback following the events at OpenAI
civilsociety
Dec 22, 2023, 6:55 PM
16
points
5
comments
3
min read
LW
link
“Destroy humanity” as an immediate subgoal
Seth Ahrenbach
Dec 22, 2023, 6:52 PM
3
points
13
comments
3
min read
LW
link
Synthetic Restrictions
nano_brasca
Dec 22, 2023, 6:50 PM
10
points
0
comments
4
min read
LW
link
Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter
Dec 22, 2023, 6:48 PM
37
points
11
comments
38
min read
LW
link
The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89
Dec 22, 2023, 4:13 PM
75
points
43
comments
3
min read
LW
link
(www.beren.io)
Employee Incentives Make AGI Lab Pauses More Costly
Nikola Jurkovic
Dec 22, 2023, 5:04 AM
28
points
12
comments
3
min read
LW
link
The LessWrong 2022 Review: Review Phase
RobertM
Dec 22, 2023, 3:23 AM
58
points
7
comments
2
min read
LW
link
The absence of self-rejection is self-acceptance
Chipmonk
Dec 21, 2023, 9:54 PM
24
points
1
comment
1
min read
LW
link
(chipmonk.substack.com)
A Decision Theory Can Be Rational or Computable, but Not Both
StrivingForLegibility
Dec 21, 2023, 9:02 PM
9
points
4
comments
1
min read
LW
link
Most People Don’t Realize We Have No Idea How Our AIs Work
Thane Ruthenis
Dec 21, 2023, 8:02 PM
159
points
42
comments
1
min read
LW
link
Pseudonymity and Accusations
jefftk
Dec 21, 2023, 7:20 PM
52
points
20
comments
3
min read
LW
link
(www.jefftk.com)
Attention on AI X-Risk Likely Hasn’t Distracted from Current Harms from AI
Erich_Grunewald
Dec 21, 2023, 5:24 PM
26
points
2
comments
17
min read
LW
link
(www.erichgrunewald.com)
“Alignment” is one of six words of the year in the Harvard Gazette
Nikola Jurkovic
Dec 21, 2023, 3:54 PM
14
points
1
comment
1
min read
LW
link
(news.harvard.edu)
AI #43: Functional Discoveries
Zvi
Dec 21, 2023, 3:50 PM
52
points
26
comments
49
min read
LW
link
(thezvi.wordpress.com)
Rating my AI Predictions
Robert_AIZI
Dec 21, 2023, 2:07 PM
22
points
5
comments
2
min read
LW
link
(aizi.substack.com)
AI Safety Chatbot
markov
and
Robert Miles
21 Dec 2023 14:06 UTC
61
points
11
comments
4
min read
LW
link
On OpenAI’s Preparedness Framework
Zvi
21 Dec 2023 14:00 UTC
51
points
4
comments
21
min read
LW
link
(thezvi.wordpress.com)
Prediction Markets aren’t Magic
SimonM
21 Dec 2023 12:54 UTC
90
points
29
comments
3
min read
LW
link
[Question]
Why is capnometry biofeedback not more widely known?
riceissa
21 Dec 2023 2:42 UTC
20
points
22
comments
4
min read
LW
link
My best guess at the important tricks for training 1L SAEs
Arthur Conmy
21 Dec 2023 1:59 UTC
37
points
4
comments
3
min read
LW
link
Seattle Winter Solstice
a7x
20 Dec 2023 20:30 UTC
6
points
1
comment
1
min read
LW
link
How Would an Utopia-Maximizer Look Like?
Thane Ruthenis
20 Dec 2023 20:01 UTC
32
points
23
comments
10
min read
LW
link
Succession
Richard_Ngo
20 Dec 2023 19:25 UTC
159
points
48
comments
11
min read
LW
link
(www.narrativeark.xyz)
Metaculus Introduces Multiple Choice Questions
ChristianWilliams
20 Dec 2023 19:00 UTC
4
points
0
comments
LW
link
(www.metaculus.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel