Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
FTL travel summary
Isaac King
Dec 4, 2023, 5:17 AM
1
point
3
comments
3
min read
LW
link
Disappointing Table Refinishing
jefftk
Dec 4, 2023, 2:50 AM
14
points
3
comments
1
min read
LW
link
(www.jefftk.com)
the micro-fulfillment cambrian explosion
bhauth
Dec 4, 2023, 1:15 AM
54
points
5
comments
4
min read
LW
link
(www.bhauth.com)
Nietzsche’s Morality in Plain English
Arjun Panickssery
Dec 4, 2023, 12:57 AM
92
points
14
comments
4
min read
LW
link
1
review
(arjunpanickssery.substack.com)
Meditations on Mot
Richard_Ngo
Dec 4, 2023, 12:19 AM
56
points
11
comments
8
min read
LW
link
(www.mindthefuture.info)
The Witness
Richard_Ngo
Dec 3, 2023, 10:27 PM
105
points
5
comments
14
min read
LW
link
(www.narrativeark.xyz)
Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of “Scheming AIs”)
Joe Carlsmith
Dec 3, 2023, 6:32 PM
9
points
0
comments
17
min read
LW
link
[Question]
How do you do post mortems?
matto
Dec 3, 2023, 2:46 PM
9
points
2
comments
1
min read
LW
link
The benefits and risks of optimism (about AI safety)
Karl von Wendt
Dec 3, 2023, 12:45 PM
−7
points
6
comments
5
min read
LW
link
Book Review: 1948 by Benny Morris
Yair Halberstadt
Dec 3, 2023, 10:29 AM
41
points
9
comments
12
min read
LW
link
Quick takes on “AI is easy to control”
So8res
Dec 2, 2023, 10:31 PM
26
points
49
comments
4
min read
LW
link
The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)
Joe Carlsmith
Dec 2, 2023, 3:20 PM
8
points
1
comment
15
min read
LW
link
The Method of Loci: With some brief remarks, including transformers and evaluating AIs
Bill Benzon
Dec 2, 2023, 2:36 PM
6
points
0
comments
3
min read
LW
link
Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret
Dec 2, 2023, 2:07 PM
26
points
31
comments
42
min read
LW
link
Out-of-distribution Bioattacks
jefftk
Dec 2, 2023, 12:20 PM
66
points
15
comments
2
min read
LW
link
(www.jefftk.com)
After Alignment — Dialogue between RogerDearnaley and Seth Herd
RogerDearnaley
and
Seth Herd
Dec 2, 2023, 6:03 AM
15
points
2
comments
25
min read
LW
link
List of strategies for mitigating deceptive alignment
joshc
Dec 2, 2023, 5:56 AM
38
points
2
comments
6
min read
LW
link
[Question]
What is known about invariants in self-modifying systems?
mishka
Dec 2, 2023, 5:04 AM
9
points
2
comments
1
min read
LW
link
2023 Unofficial LessWrong Census/Survey
Screwtape
Dec 2, 2023, 4:41 AM
169
points
81
comments
1
min read
LW
link
Protecting against sudden capability jumps during training
Nikola Jurkovic
Dec 2, 2023, 4:22 AM
15
points
2
comments
2
min read
LW
link
South Bay Pre-Holiday Gathering
IS
Dec 2, 2023, 3:21 AM
10
points
2
comments
1
min read
LW
link
MATS Summer 2023 Retrospective
utilistrutil
,
Juan Gil
,
Ryan Kidd
,
Christian Smith
,
McKennaFitzgerald
and
LauraVaughan
Dec 1, 2023, 11:29 PM
77
points
34
comments
26
min read
LW
link
Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann
and
habryka
Dec 1, 2023, 10:10 PM
65
points
11
comments
19
min read
LW
link
[Question]
Could there be “natural impact regularization” or “impact regularization by default”?
tailcalled
Dec 1, 2023, 10:01 PM
24
points
6
comments
1
min read
LW
link
Benchmarking Bowtie2 Threading
jefftk
Dec 1, 2023, 8:20 PM
9
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Please Bet On My Quantified Self Decision Markets
niplav
Dec 1, 2023, 8:07 PM
36
points
6
comments
6
min read
LW
link
Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]
Writer
Dec 1, 2023, 7:30 PM
19
points
0
comments
5
min read
LW
link
(youtu.be)
Carving up problems at their joints
Jakub Smékal
Dec 1, 2023, 6:48 PM
1
point
0
comments
2
min read
LW
link
(jakubsmekal.com)
Queuing theory: Benefits of operating at 60% capacity
ampdot
Dec 1, 2023, 6:48 PM
43
points
4
comments
1
min read
LW
link
(less.works)
Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)
ampdot
Dec 1, 2023, 6:48 PM
14
points
0
comments
1
min read
LW
link
(airtable.com)
Kolmogorov Complexity Lays Bare the Soul
jakej
Dec 1, 2023, 6:29 PM
5
points
8
comments
2
min read
LW
link
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes
Dec 1, 2023, 5:30 PM
197
points
63
comments
14
min read
LW
link
1
review
Why Did NEPA Peak in 2016?
Maxwell Tabarrok
Dec 1, 2023, 4:18 PM
10
points
0
comments
3
min read
LW
link
(maximumprogress.substack.com)
Worlds where I wouldn’t worry about AI risk
adekcz
Dec 1, 2023, 4:06 PM
2
points
0
comments
4
min read
LW
link
How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)
Joe Carlsmith
Dec 1, 2023, 2:51 PM
10
points
1
comment
7
min read
LW
link
Reality is whatever you can get away with.
sometimesperson
Dec 1, 2023, 7:50 AM
−5
points
0
comments
1
min read
LW
link
Reinforcement Learning using Layered Morphology (RLLM)
MiguelDev
Dec 1, 2023, 5:18 AM
7
points
0
comments
29
min read
LW
link
[Question]
Is OpenAI losing money on each request?
thenoviceoof
Dec 1, 2023, 3:27 AM
8
points
8
comments
5
min read
LW
link
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
and
habryka
Dec 1, 2023, 2:54 AM
167
points
54
comments
25
min read
LW
link
FixDT
abramdemski
Nov 30, 2023, 9:57 PM
64
points
15
comments
14
min read
LW
link
1
review
Generalization, from thermodynamics to statistical physics
Jesse Hoogland
Nov 30, 2023, 9:28 PM
64
points
9
comments
28
min read
LW
link
What’s next for the field of Agent Foundations?
Nora_Ammann
,
Alexander Gietelink Oldenziel
and
mattmacdermott
Nov 30, 2023, 5:55 PM
59
points
23
comments
10
min read
LW
link
A Proposed Cure for Alzheimer’s Disease???
MadHatter
Nov 30, 2023, 5:37 PM
4
points
30
comments
2
min read
LW
link
AI #40: A Vision from Vitalik
Zvi
Nov 30, 2023, 5:30 PM
53
points
12
comments
42
min read
LW
link
(thezvi.wordpress.com)
Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)
Joe Carlsmith
Nov 30, 2023, 4:43 PM
8
points
0
comments
6
min read
LW
link
A Formula for Violence (and Its Antidote)
MadHatter
Nov 30, 2023, 4:04 PM
−22
points
6
comments
1
min read
LW
link
(blog.simpleheart.org)
Enkrateia: a safe model-based reinforcement learning algorithm
MadHatter
Nov 30, 2023, 3:51 PM
−15
points
4
comments
2
min read
LW
link
(github.com)
Normative Ethics vs Utilitarianism
Logan Zoellner
Nov 30, 2023, 3:36 PM
6
points
0
comments
2
min read
LW
link
(midwitalignment.substack.com)
Information-Theoretic Boxing of Superintelligences
JustinShovelain
and
Elliot Mckernon
Nov 30, 2023, 2:31 PM
30
points
0
comments
7
min read
LW
link
OpenAI: Altman Returns
Zvi
Nov 30, 2023, 2:10 PM
66
points
12
comments
11
min read
LW
link
(thezvi.wordpress.com)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel