Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
2
The Filan Cabinet Podcast with Oliver Habryka—Transcript
MondSemmel
and
RobertM
Feb 14, 2023, 2:38 AM
101
points
9
comments
72
min read
LW
link
Latent variables for prediction markets: motivation, technical guide, and design considerations
tailcalled
Feb 12, 2023, 5:54 PM
100
points
25
comments
23
min read
LW
link
2
reviews
Don’t accelerate problems you’re trying to solve
Andrea_Miotti
and
remember
Feb 15, 2023, 6:11 PM
100
points
27
comments
4
min read
LW
link
Basic facts about language models during training
beren
Feb 21, 2023, 11:46 AM
98
points
15
comments
18
min read
LW
link
A circuit for Python docstrings in a 4-layer attention-only transformer
StefanHex
and
Jett Janiak
Feb 20, 2023, 7:35 PM
96
points
8
comments
21
min read
LW
link
Research agenda: Formalizing abstractions of computations
Erik Jenner
Feb 2, 2023, 4:29 AM
93
points
10
comments
31
min read
LW
link
Covid 2/23/23: Your Best Possible Situation
Zvi
Feb 23, 2023, 1:10 PM
92
points
9
comments
5
min read
LW
link
(thezvi.wordpress.com)
Exercise is Good, Actually
Gordon Seidoh Worley
Feb 2, 2023, 12:09 AM
91
points
27
comments
3
min read
LW
link
SolidGoldMagikarp III: Glitch token archaeology
mwatkins
and
Jessica Rumbelow
Feb 14, 2023, 10:17 AM
91
points
35
comments
16
min read
LW
link
Retrospective on the 2022 Conjecture AI Discussions
Andrea_Miotti
Feb 24, 2023, 10:41 PM
90
points
5
comments
2
min read
LW
link
Deceptive Alignment is <1% Likely by Default
DavidW
Feb 21, 2023, 3:09 PM
89
points
31
comments
14
min read
LW
link
1
review
Conditioning Predictive Models: Large language models as predictors
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 2, 2023, 8:28 PM
88
points
4
comments
13
min read
LW
link
Qualities that alignment mentors value in junior researchers
Orpheus16
Feb 14, 2023, 11:27 PM
88
points
14
comments
3
min read
LW
link
Podcast with Oli Habryka on LessWrong / Lightcone Infrastructure
DanielFilan
Feb 5, 2023, 2:52 AM
88
points
20
comments
1
min read
LW
link
(thefilancabinet.com)
The Cave Allegory Revisited: Understanding GPT’s Worldview
Jan_Kulveit
Feb 14, 2023, 4:00 PM
86
points
5
comments
3
min read
LW
link
Building and Entertaining Couples
Jacob Falkovich
Feb 22, 2023, 7:02 PM
86
points
11
comments
4
min read
LW
link
Decision Transformer Interpretability
Joseph Bloom
and
Paul Colognese
Feb 6, 2023, 7:29 AM
85
points
13
comments
24
min read
LW
link
You are probably not a good alignment researcher, and other blatant lies
junk heap homotopy
Feb 2, 2023, 1:55 PM
83
points
16
comments
2
min read
LW
link
LLM Basics: Embedding Spaces—Transformer Token Vectors Are Not Points in Space
NickyP
Feb 13, 2023, 6:52 PM
83
points
11
comments
15
min read
LW
link
Bankless Podcast: 159 - We’re All Gonna Die with Eliezer Yudkowsky
bayesed
Feb 20, 2023, 4:42 PM
83
points
54
comments
1
min read
LW
link
(www.youtube.com)
Teleosemantics!
abramdemski
Feb 23, 2023, 11:26 PM
82
points
27
comments
6
min read
LW
link
1
review
Tools for finding information on the internet
RomanHauksson
Feb 9, 2023, 5:05 PM
79
points
11
comments
2
min read
LW
link
(roman.computer)
OpenAI/Microsoft announce “next generation language model” integrated into Bing/Edge
LawrenceC
Feb 7, 2023, 8:38 PM
79
points
4
comments
1
min read
LW
link
(blogs.microsoft.com)
Two problems with ‘Simulators’ as a frame
ryan_greenblatt
Feb 17, 2023, 11:34 PM
79
points
13
comments
5
min read
LW
link
[Linkpost] Google invested $300M in Anthropic in late 2022
Orpheus16
Feb 3, 2023, 7:13 PM
73
points
14
comments
1
min read
LW
link
(www.ft.com)
Review of AI Alignment Progress
PeterMcCluskey
Feb 7, 2023, 6:57 PM
72
points
32
comments
7
min read
LW
link
(bayesianinvestor.com)
Conditioning Predictive Models: Outer alignment via careful conditioning
evhub
,
Adam Jermyn
,
Johannes Treutlein
,
Rubi J. Hudson
and
kcwoolverton
Feb 2, 2023, 8:28 PM
72
points
15
comments
57
min read
LW
link
Why I’m not working on {debate, RRM, ELK, natural abstractions}
Steven Byrnes
Feb 10, 2023, 7:22 PM
71
points
19
comments
10
min read
LW
link
Prizes for the 2021 Review
Raemon
Feb 10, 2023, 7:47 PM
69
points
2
comments
4
min read
LW
link
Here’s Why I’m Hesitant To Respond In More Depth
DirectedEvolution
Feb 6, 2023, 6:36 PM
67
points
10
comments
4
min read
LW
link
1
review
Voting Results for the 2021 Review
Raemon
Feb 1, 2023, 8:02 AM
66
points
10
comments
38
min read
LW
link
The Preference Fulfillment Hypothesis
Kaj_Sotala
Feb 26, 2023, 10:55 AM
66
points
62
comments
11
min read
LW
link
Paper: The Capacity for Moral Self-Correction in Large Language Models (Anthropic)
LawrenceC
Feb 16, 2023, 7:47 PM
65
points
9
comments
1
min read
LW
link
(arxiv.org)
On Developing a Mathematical Theory of Interpretability
carboniferous_umbraculum
Feb 9, 2023, 1:45 AM
64
points
8
comments
6
min read
LW
link
Rationality-related things I don’t know as of 2023
Adam Zerner
Feb 11, 2023, 6:04 AM
64
points
59
comments
3
min read
LW
link
Emergent Deception and Emergent Optimization
jsteinhardt
Feb 20, 2023, 2:40 AM
64
points
0
comments
14
min read
LW
link
(bounded-regret.ghost.io)
I Am Scared of Posting Negative Takes About Bing’s AI
Yitz
Feb 17, 2023, 8:50 PM
63
points
28
comments
1
min read
LW
link
Speedrunning 4 mistakes you make when your alignment strategy is based on formal proof
Quinn
Feb 16, 2023, 1:13 AM
63
points
18
comments
2
min read
LW
link
Learning How to Learn (And 20+ Studies)
maxa
Feb 26, 2023, 10:46 PM
63
points
12
comments
6
min read
LW
link
(max2c.com)
Aiming for Convergence Is Like Discouraging Betting
Zack_M_Davis
Feb 1, 2023, 12:03 AM
62
points
18
comments
11
min read
LW
link
1
review
Are short timelines actually bad?
joshc
Feb 5, 2023, 9:21 PM
61
points
7
comments
3
min read
LW
link
Christiano (ARC) and GA (Conjecture) Discuss Alignment Cruxes
Andrea_Miotti
,
paulfchristiano
,
Gabriel Alfour
and
OliviaJ
Feb 24, 2023, 11:03 PM
61
points
7
comments
47
min read
LW
link
Buddhist Psychotechnology for Withstanding Apocalypse Stress
romeostevensit
Feb 25, 2023, 3:11 AM
61
points
10
comments
5
min read
LW
link
A mechanistic explanation for SolidGoldMagikarp-like tokens in GPT2
MadHatter
Feb 26, 2023, 1:10 AM
61
points
14
comments
6
min read
LW
link
Who invented knitting? The plot thickens
eukaryote
Feb 5, 2023, 12:24 AM
60
points
9
comments
19
min read
LW
link
(eukaryotewritesblog.com)
AGI systems & humans will both need to solve the alignment problem
Jeffrey Ladish
Feb 24, 2023, 3:29 AM
59
points
14
comments
4
min read
LW
link
Human beats SOTA Go AI by learning an adversarial policy
Vanessa Kosoy
Feb 19, 2023, 9:38 AM
59
points
32
comments
1
min read
LW
link
(goattack.far.ai)
Respect Chesterton-Schelling Fences
Shmi
Feb 27, 2023, 12:09 AM
58
points
17
comments
1
min read
LW
link
[Question]
How seriously should we take the hypothesis that LW is just wrong on how AI will impact the 21st century?
Noosphere89
Feb 16, 2023, 3:25 PM
58
points
66
comments
1
min read
LW
link
What is it like doing AI safety work?
KatWoods
Feb 21, 2023, 8:12 PM
57
points
2
comments
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel