Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
align your latent spaces
bhauth
Dec 24, 2023, 4:30 PM
27
points
8
comments
2
min read
LW
link
(www.bhauth.com)
Viral Guessing Game
jefftk
Dec 24, 2023, 1:10 PM
19
points
0
comments
1
min read
LW
link
(www.jefftk.com)
The Sugar Alignment Problem
Adam Zerner
Dec 24, 2023, 1:35 AM
5
points
3
comments
7
min read
LW
link
A Crisper Explanation of Simulacrum Levels
Thane Ruthenis
Dec 23, 2023, 10:13 PM
92
points
13
comments
13
min read
LW
link
Hyperbolic Discounting and Pascal’s Mugging
Andrew Keenan Richardson
Dec 23, 2023, 9:55 PM
9
points
0
comments
7
min read
LW
link
AISN #28: Center for AI Safety 2023 Year in Review
Dan H
Dec 23, 2023, 9:31 PM
30
points
1
comment
5
min read
LW
link
(newsletter.safe.ai)
“Inftoxicity” and other new words to describe malicious information and communication thereof
Jáchym Fibír
Dec 23, 2023, 6:15 PM
−1
points
6
comments
3
min read
LW
link
AI’s impact on biology research: Part I, today
octopocta
Dec 23, 2023, 4:29 PM
31
points
6
comments
2
min read
LW
link
AI Girlfriends Won’t Matter Much
Maxwell Tabarrok
Dec 23, 2023, 3:58 PM
42
points
22
comments
2
min read
LW
link
(maximumprogress.substack.com)
The Next Right Token
jefftk
Dec 23, 2023, 3:20 AM
14
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Fact Finding: Do Early Layers Specialise in Local Processing? (Post 5)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
18
points
0
comments
4
min read
LW
link
Fact Finding: How to Think About Interpreting Memorisation (Post 4)
Senthooran Rajamanoharan
,
Neel Nanda
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
22
points
0
comments
9
min read
LW
link
Fact Finding: Trying to Mechanistically Understanding Early MLPs (Post 3)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:46 AM
10
points
1
comment
16
min read
LW
link
Fact Finding: Simplifying the Circuit (Post 2)
Senthooran Rajamanoharan
,
Neel Nanda
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:45 AM
25
points
3
comments
14
min read
LW
link
Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1)
Neel Nanda
,
Senthooran Rajamanoharan
,
János Kramár
and
Rohin Shah
Dec 23, 2023, 2:44 AM
106
points
10
comments
22
min read
LW
link
2
reviews
Measurement tampering detection as a special case of weak-to-strong generalization
ryan_greenblatt
,
Fabien Roger
and
Buck
Dec 23, 2023, 12:05 AM
57
points
10
comments
4
min read
LW
link
How does a toy 2 digit subtraction transformer predict the difference?
Evan Anders
Dec 22, 2023, 9:17 PM
12
points
0
comments
10
min read
LW
link
(evanhanders.blog)
Thoughts on Max Tegmark’s AI verification
Johannes C. Mayer
Dec 22, 2023, 8:38 PM
10
points
0
comments
3
min read
LW
link
Idealized Agents Are Approximate Causal Mirrors (+ Radical Optimism on Agent Foundations)
Thane Ruthenis
Dec 22, 2023, 8:19 PM
75
points
14
comments
6
min read
LW
link
AI safety advocates should consider providing gentle pushback following the events at OpenAI
civilsociety
Dec 22, 2023, 6:55 PM
16
points
5
comments
3
min read
LW
link
“Destroy humanity” as an immediate subgoal
Seth Ahrenbach
Dec 22, 2023, 6:52 PM
3
points
13
comments
3
min read
LW
link
Synthetic Restrictions
nano_brasca
Dec 22, 2023, 6:50 PM
10
points
0
comments
4
min read
LW
link
Review Report of Davidson on Takeoff Speeds (2023)
Trent Kannegieter
Dec 22, 2023, 6:48 PM
37
points
11
comments
38
min read
LW
link
The problems with the concept of an infohazard as used by the LW community [Linkpost]
Noosphere89
Dec 22, 2023, 4:13 PM
75
points
43
comments
3
min read
LW
link
(www.beren.io)
Employee Incentives Make AGI Lab Pauses More Costly
Nikola Jurkovic
Dec 22, 2023, 5:04 AM
28
points
12
comments
3
min read
LW
link
The LessWrong 2022 Review: Review Phase
RobertM
Dec 22, 2023, 3:23 AM
58
points
7
comments
2
min read
LW
link
The absence of self-rejection is self-acceptance
Chipmonk
Dec 21, 2023, 9:54 PM
24
points
1
comment
1
min read
LW
link
(chipmonk.substack.com)
A Decision Theory Can Be Rational or Computable, but Not Both
StrivingForLegibility
Dec 21, 2023, 9:02 PM
9
points
4
comments
1
min read
LW
link
Most People Don’t Realize We Have No Idea How Our AIs Work
Thane Ruthenis
Dec 21, 2023, 8:02 PM
159
points
42
comments
1
min read
LW
link
Pseudonymity and Accusations
jefftk
Dec 21, 2023, 7:20 PM
52
points
20
comments
3
min read
LW
link
(www.jefftk.com)
Attention on AI X-Risk Likely Hasn’t Distracted from Current Harms from AI
Erich_Grunewald
Dec 21, 2023, 5:24 PM
26
points
2
comments
17
min read
LW
link
(www.erichgrunewald.com)
“Alignment” is one of six words of the year in the Harvard Gazette
Nikola Jurkovic
Dec 21, 2023, 3:54 PM
14
points
1
comment
1
min read
LW
link
(news.harvard.edu)
AI #43: Functional Discoveries
Zvi
Dec 21, 2023, 3:50 PM
52
points
26
comments
49
min read
LW
link
(thezvi.wordpress.com)
Rating my AI Predictions
Robert_AIZI
Dec 21, 2023, 2:07 PM
22
points
5
comments
2
min read
LW
link
(aizi.substack.com)
AI Safety Chatbot
markov
and
Robert Miles
Dec 21, 2023, 2:06 PM
61
points
11
comments
4
min read
LW
link
On OpenAI’s Preparedness Framework
Zvi
Dec 21, 2023, 2:00 PM
51
points
4
comments
21
min read
LW
link
(thezvi.wordpress.com)
Prediction Markets aren’t Magic
SimonM
Dec 21, 2023, 12:54 PM
90
points
29
comments
3
min read
LW
link
[Question]
Why is capnometry biofeedback not more widely known?
riceissa
Dec 21, 2023, 2:42 AM
20
points
22
comments
4
min read
LW
link
My best guess at the important tricks for training 1L SAEs
Arthur Conmy
Dec 21, 2023, 1:59 AM
37
points
4
comments
3
min read
LW
link
Seattle Winter Solstice
a7x
Dec 20, 2023, 8:30 PM
6
points
1
comment
1
min read
LW
link
How Would an Utopia-Maximizer Look Like?
Thane Ruthenis
Dec 20, 2023, 8:01 PM
32
points
23
comments
10
min read
LW
link
Succession
Richard_Ngo
Dec 20, 2023, 7:25 PM
159
points
48
comments
11
min read
LW
link
(www.narrativeark.xyz)
Metaculus Introduces Multiple Choice Questions
ChristianWilliams
Dec 20, 2023, 7:00 PM
4
points
0
comments
LW
link
(www.metaculus.com)
Brighter Than Today Versions
jefftk
Dec 20, 2023, 6:20 PM
16
points
2
comments
2
min read
LW
link
(www.jefftk.com)
Gaia Network: a practical, incremental pathway to Open Agency Architecture
Roman Leventov
and
Rafael Kaufmann Nedal
Dec 20, 2023, 5:11 PM
22
points
8
comments
16
min read
LW
link
On the future of language models
owencb
Dec 20, 2023, 4:58 PM
105
points
17
comments
LW
link
[Valence series] Appendix A: Hedonic tone / (dis)pleasure / (dis)liking
Steven Byrnes
Dec 20, 2023, 3:54 PM
18
points
0
comments
13
min read
LW
link
Matrix completion prize results
paulfchristiano
Dec 20, 2023, 3:40 PM
42
points
0
comments
2
min read
LW
link
(www.alignment.org)
[Question]
What’s the minimal additive constant for Kolmogorov Complexity that a programming language can achieve?
Noosphere89
20 Dec 2023 15:36 UTC
11
points
15
comments
1
min read
LW
link
Legalize butanol?
bhauth
20 Dec 2023 14:24 UTC
39
points
20
comments
5
min read
LW
link
(www.bhauth.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel