Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
MATS Summer 2023 Retrospective
utilistrutil
,
Juan Gil
,
Ryan Kidd
,
Christian Smith
,
McKennaFitzgerald
and
LauraVaughan
Dec 1, 2023, 11:29 PM
77
points
34
comments
26
min read
LW
link
Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann
and
habryka
Dec 1, 2023, 10:10 PM
65
points
11
comments
19
min read
LW
link
[Question]
Could there be “natural impact regularization” or “impact regularization by default”?
tailcalled
Dec 1, 2023, 10:01 PM
24
points
6
comments
1
min read
LW
link
Benchmarking Bowtie2 Threading
jefftk
Dec 1, 2023, 8:20 PM
9
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Please Bet On My Quantified Self Decision Markets
niplav
Dec 1, 2023, 8:07 PM
36
points
6
comments
6
min read
LW
link
Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]
Writer
Dec 1, 2023, 7:30 PM
19
points
0
comments
5
min read
LW
link
(youtu.be)
Carving up problems at their joints
Jakub Smékal
Dec 1, 2023, 6:48 PM
1
point
0
comments
2
min read
LW
link
(jakubsmekal.com)
Queuing theory: Benefits of operating at 60% capacity
ampdot
Dec 1, 2023, 6:48 PM
43
points
4
comments
1
min read
LW
link
(less.works)
Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)
ampdot
Dec 1, 2023, 6:48 PM
14
points
0
comments
1
min read
LW
link
(airtable.com)
Kolmogorov Complexity Lays Bare the Soul
jakej
Dec 1, 2023, 6:29 PM
5
points
8
comments
2
min read
LW
link
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes
Dec 1, 2023, 5:30 PM
197
points
63
comments
14
min read
LW
link
1
review
Why Did NEPA Peak in 2016?
Maxwell Tabarrok
Dec 1, 2023, 4:18 PM
10
points
0
comments
3
min read
LW
link
(maximumprogress.substack.com)
Worlds where I wouldn’t worry about AI risk
adekcz
Dec 1, 2023, 4:06 PM
2
points
0
comments
4
min read
LW
link
How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)
Joe Carlsmith
Dec 1, 2023, 2:51 PM
10
points
1
comment
7
min read
LW
link
Reality is whatever you can get away with.
sometimesperson
Dec 1, 2023, 7:50 AM
−5
points
0
comments
1
min read
LW
link
Reinforcement Learning using Layered Morphology (RLLM)
MiguelDev
Dec 1, 2023, 5:18 AM
7
points
0
comments
29
min read
LW
link
[Question]
Is OpenAI losing money on each request?
thenoviceoof
Dec 1, 2023, 3:27 AM
8
points
8
comments
5
min read
LW
link
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
and
habryka
Dec 1, 2023, 2:54 AM
167
points
54
comments
25
min read
LW
link
FixDT
abramdemski
Nov 30, 2023, 9:57 PM
64
points
15
comments
14
min read
LW
link
1
review
Generalization, from thermodynamics to statistical physics
Jesse Hoogland
Nov 30, 2023, 9:28 PM
64
points
9
comments
28
min read
LW
link
What’s next for the field of Agent Foundations?
Nora_Ammann
,
Alexander Gietelink Oldenziel
and
mattmacdermott
Nov 30, 2023, 5:55 PM
59
points
23
comments
10
min read
LW
link
A Proposed Cure for Alzheimer’s Disease???
MadHatter
Nov 30, 2023, 5:37 PM
4
points
30
comments
2
min read
LW
link
AI #40: A Vision from Vitalik
Zvi
Nov 30, 2023, 5:30 PM
53
points
12
comments
42
min read
LW
link
(thezvi.wordpress.com)
Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)
Joe Carlsmith
Nov 30, 2023, 4:43 PM
8
points
0
comments
6
min read
LW
link
A Formula for Violence (and Its Antidote)
MadHatter
Nov 30, 2023, 4:04 PM
−22
points
6
comments
1
min read
LW
link
(blog.simpleheart.org)
Enkrateia: a safe model-based reinforcement learning algorithm
MadHatter
Nov 30, 2023, 3:51 PM
−15
points
4
comments
2
min read
LW
link
(github.com)
Normative Ethics vs Utilitarianism
Logan Zoellner
Nov 30, 2023, 3:36 PM
6
points
0
comments
2
min read
LW
link
(midwitalignment.substack.com)
Information-Theoretic Boxing of Superintelligences
JustinShovelain
and
Elliot Mckernon
Nov 30, 2023, 2:31 PM
30
points
0
comments
7
min read
LW
link
OpenAI: Altman Returns
Zvi
Nov 30, 2023, 2:10 PM
66
points
12
comments
11
min read
LW
link
(thezvi.wordpress.com)
[Linkpost] Remarks on the Convergence in Distribution of Random Neural Networks to Gaussian Processes in the Infinite Width Limit
carboniferous_umbraculum
Nov 30, 2023, 2:01 PM
9
points
0
comments
1
min read
LW
link
(drive.google.com)
[Question]
Buy Nothing Day is a great idea with a terrible app— why has nobody built a killer app for crowdsourced ‘effective communism’ yet?
lillybaeum
Nov 30, 2023, 1:47 PM
8
points
17
comments
1
min read
LW
link
[Question]
Comprehensible Input is the only way people learn languages—is it the only way people *learn*?
lillybaeum
Nov 30, 2023, 1:31 PM
8
points
2
comments
3
min read
LW
link
Some Intuitions for the Ethicophysics
MadHatter
and
mishka
Nov 30, 2023, 6:47 AM
2
points
4
comments
8
min read
LW
link
The Alignment Agenda THEY Don’t Want You to Know About
MadHatter
Nov 30, 2023, 4:29 AM
−19
points
16
comments
1
min read
LW
link
Cis fragility
[deactivated]
Nov 30, 2023, 4:14 AM
−51
points
9
comments
3
min read
LW
link
Homework Answer: Glicko Ratings for War
MadHatter
Nov 30, 2023, 4:08 AM
−45
points
1
comment
77
min read
LW
link
(gist.github.com)
[Question]
Feature Request for LessWrong
MadHatter
Nov 30, 2023, 3:19 AM
11
points
8
comments
1
min read
LW
link
My Alignment Research Agenda (“the Ethicophysics”)
MadHatter
Nov 30, 2023, 2:57 AM
−13
points
0
comments
1
min read
LW
link
[Question]
Stupid Question: Why am I getting consistently downvoted?
MadHatter
Nov 30, 2023, 12:21 AM
31
points
138
comments
1
min read
LW
link
Inositol Non-Results
Elizabeth
Nov 29, 2023, 9:40 PM
20
points
2
comments
1
min read
LW
link
(acesounderglass.com)
Losing Metaphors: Zip and Paste
jefftk
Nov 29, 2023, 8:31 PM
26
points
6
comments
1
min read
LW
link
(www.jefftk.com)
Preserving our heritage: Building a movement and a knowledge ark for current and future generations
rnk8
Nov 29, 2023, 7:20 PM
0
points
5
comments
12
min read
LW
link
AGI Alignment is Absurd
Youssef Mohamed
Nov 29, 2023, 7:11 PM
−9
points
4
comments
3
min read
LW
link
The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford
Nov 29, 2023, 6:30 PM
30
points
1
comment
1
min read
LW
link
(rootsofprogress.org)
ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik
Nov 29, 2023, 6:11 PM
33
points
16
comments
14
min read
LW
link
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith
Nov 29, 2023, 4:32 PM
29
points
1
comment
11
min read
LW
link
Lying Alignment Chart
Zack_M_Davis
Nov 29, 2023, 4:15 PM
77
points
17
comments
1
min read
LW
link
Rethink Priorities: Seeking Expressions of Interest for Special Projects Next Year
kierangreig
Nov 29, 2023, 1:59 PM
4
points
0
comments
5
min read
LW
link
[Question]
Thoughts on teletransportation with copies?
titotal
Nov 29, 2023, 12:56 PM
15
points
13
comments
1
min read
LW
link
Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall
Nov 29, 2023, 12:56 PM
76
points
9
comments
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel