Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Pessimistic Shard Theory
Garrett Baker
Jan 25, 2023, 12:59 AM
72
points
13
comments
3
min read
LW
link
A general comment on discussions of genetic group differences
anonymous8101
Jan 14, 2023, 2:11 AM
71
points
46
comments
3
min read
LW
link
“Status” can be corrosive; here’s how I handle it
Orpheus16
Jan 24, 2023, 1:25 AM
71
points
8
comments
6
min read
LW
link
How we could stumble into AI catastrophe
HoldenKarnofsky
Jan 13, 2023, 4:20 PM
71
points
18
comments
18
min read
LW
link
(www.cold-takes.com)
Opportunity Cost Blackmail
adamShimi
Jan 2, 2023, 1:48 PM
70
points
11
comments
2
min read
LW
link
(epistemologicalvigilance.substack.com)
Some of my disagreements with List of Lethalities
TurnTrout
Jan 24, 2023, 12:25 AM
70
points
7
comments
10
min read
LW
link
Investing for a World Transformed by AI
PeterMcCluskey
Jan 1, 2023, 2:47 AM
70
points
24
comments
6
min read
LW
link
1
review
(bayesianinvestor.com)
AGI safety field building projects I’d like to see
Severin T. Seehrich
Jan 19, 2023, 10:40 PM
68
points
28
comments
9
min read
LW
link
Infohazards vs Fork Hazards
jimrandomh
Jan 5, 2023, 9:45 AM
68
points
16
comments
1
min read
LW
link
Thoughts on hardware / compute requirements for AGI
Steven Byrnes
Jan 24, 2023, 2:03 PM
63
points
32
comments
24
min read
LW
link
Simulacra are Things
janus
Jan 8, 2023, 11:03 PM
63
points
7
comments
2
min read
LW
link
Dangers of deference
TsviBT
Jan 8, 2023, 2:36 PM
62
points
5
comments
2
min read
LW
link
Tracr: Compiled Transformers as a Laboratory for Interpretability | DeepMind
DragonGod
Jan 13, 2023, 4:53 PM
62
points
12
comments
1
min read
LW
link
(arxiv.org)
Escape Velocity from Bullshit Jobs
Zvi
Jan 10, 2023, 2:30 PM
61
points
18
comments
5
min read
LW
link
(thezvi.wordpress.com)
My first year in AI alignment
Alex_Altair
Jan 2, 2023, 1:28 AM
61
points
10
comments
7
min read
LW
link
Announcing aisafety.training
JJ Hepburn
Jan 21, 2023, 1:01 AM
61
points
4
comments
1
min read
LW
link
Spooky action at a distance in the loss landscape
Jesse Hoogland
and
Filip Sondej
Jan 28, 2023, 12:22 AM
61
points
4
comments
7
min read
LW
link
(www.jessehoogland.com)
Movie Review: Megan
Zvi
Jan 23, 2023, 12:50 PM
60
points
19
comments
24
min read
LW
link
(thezvi.wordpress.com)
LW Filter Tags (Rationality/World Modeling now promoted in Latest Posts)
Ruby
and
RobertM
Jan 28, 2023, 10:14 PM
60
points
4
comments
3
min read
LW
link
Assigning Praise and Blame: Decoupling Epistemology and Decision Theory
adamShimi
and
Gabriel Alfour
Jan 27, 2023, 6:16 PM
59
points
5
comments
3
min read
LW
link
Conversational canyons
Henrik Karlsson
Jan 4, 2023, 6:55 PM
59
points
4
comments
7
min read
LW
link
(escapingflatland.substack.com)
Announcing Cavendish Labs
derikk
and
agg
Jan 19, 2023, 8:15 PM
59
points
5
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Consequentialists: One-Way Pattern Traps
David Udell
Jan 16, 2023, 8:48 PM
59
points
3
comments
14
min read
LW
link
[Linkpost] TIME article: DeepMind’s CEO Helped Take AI Mainstream. Now He’s Urging Caution
Orpheus16
Jan 21, 2023, 4:51 PM
58
points
2
comments
3
min read
LW
link
(time.com)
Inverse Scaling Prize: Second Round Winners
Ian McKenzie
,
Sam Bowman
and
Ethan Perez
Jan 24, 2023, 8:12 PM
58
points
17
comments
15
min read
LW
link
My Advice for Incoming SERI MATS Scholars
Johannes C. Mayer
Jan 3, 2023, 7:25 PM
58
points
6
comments
4
min read
LW
link
Linear Algebra Done Right, Axler
David Udell
Jan 2, 2023, 10:54 PM
57
points
6
comments
9
min read
LW
link
Evidence under Adversarial Conditions
PeterMcCluskey
Jan 9, 2023, 4:21 PM
57
points
1
comment
3
min read
LW
link
(bayesianinvestor.com)
Consider paying for literature or book reviews using bounties and dominant assurance contracts
Arjun Panickssery
Jan 15, 2023, 3:56 AM
57
points
7
comments
2
min read
LW
link
Gradient Filtering
Jozdien
and
janus
Jan 18, 2023, 8:09 PM
56
points
16
comments
13
min read
LW
link
What’s going on with ‘crunch time’?
rosehadshar
Jan 20, 2023, 9:42 AM
54
points
6
comments
4
min read
LW
link
Reflections on Deception & Generality in Scalable Oversight (Another OpenAI Alignment Review)
Shoshannah Tekofsky
Jan 28, 2023, 5:26 AM
53
points
7
comments
7
min read
LW
link
Why you should learn sign language
Noah Topper
Jan 18, 2023, 5:03 PM
53
points
23
comments
7
min read
LW
link
(naivebayes.substack.com)
Why and How to Graduate Early [U.S.]
Tego
Jan 29, 2023, 1:28 AM
53
points
9
comments
8
min read
LW
link
1
review
Paper: Superposition, Memorization, and Double Descent (Anthropic)
LawrenceC
Jan 5, 2023, 5:54 PM
53
points
11
comments
1
min read
LW
link
(transformer-circuits.pub)
Critique of some recent philosophy of LLMs’ minds
Roman Leventov
Jan 20, 2023, 12:53 PM
52
points
8
comments
20
min read
LW
link
Contra Common Knowledge
abramdemski
Jan 4, 2023, 10:50 PM
52
points
31
comments
16
min read
LW
link
How Likely is Losing a Google Account?
jefftk
Jan 30, 2023, 12:20 AM
52
points
12
comments
3
min read
LW
link
(www.jefftk.com)
Beware safety-washing
Lizka
Jan 13, 2023, 1:59 PM
51
points
2
comments
4
min read
LW
link
The Thingness of Things
TsviBT
Jan 1, 2023, 10:19 PM
51
points
35
comments
10
min read
LW
link
11 heuristics for choosing (alignment) research projects
Orpheus16
and
danesherbs
Jan 27, 2023, 12:36 AM
50
points
5
comments
1
min read
LW
link
[Simulators seminar sequence] #1 Background & shared assumptions
Jan
,
Charlie Steiner
,
Logan Riggs
,
janus
,
jacquesthibs
,
metasemi
,
Michael Oesterle
,
Lucas Teixeira
,
peligrietzer
and
remember
Jan 2, 2023, 11:48 PM
50
points
4
comments
3
min read
LW
link
[Question]
Would it be good or bad for the US military to get involved in AI risk?
Grant Demaree
Jan 1, 2023, 7:02 PM
50
points
12
comments
1
min read
LW
link
Trying to isolate objectives: approaches toward high-level interpretability
Jozdien
Jan 9, 2023, 6:33 PM
49
points
14
comments
8
min read
LW
link
Citability of Lesswrong and the Alignment Forum
Leon Lang
Jan 8, 2023, 10:12 PM
48
points
2
comments
1
min read
LW
link
Language models can generate superior text compared to their input
ChristianKl
Jan 17, 2023, 10:57 AM
48
points
28
comments
1
min read
LW
link
[Crosspost] ACX 2022 Prediction Contest Results
Scott Alexander
,
Eric Neyman
and
Sam Marks
Jan 24, 2023, 6:56 AM
48
points
6
comments
8
min read
LW
link
[RFC] Possible ways to expand on “Discovering Latent Knowledge in Language Models Without Supervision”.
gekaklam
,
Walter Laurito
,
Kaarel
and
Kay Kozaronek
Jan 25, 2023, 7:03 PM
48
points
6
comments
12
min read
LW
link
How-to Transformer Mechanistic Interpretability—in 50 lines of code or less!
StefanHex
Jan 24, 2023, 6:45 PM
47
points
5
comments
13
min read
LW
link
[Question]
What specific thing would you do with AI Alignment Research Assistant GPT?
quetzal_rainbow
Jan 8, 2023, 7:24 PM
47
points
9
comments
1
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel