Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Speaking to Congressional staffers about AI risk
Orpheus16
and
hath
Dec 4, 2023, 11:08 PM
312
points
25
comments
15
min read
LW
link
1
review
Open Thread – Winter 2023/2024
habryka
Dec 4, 2023, 10:59 PM
35
points
160
comments
1
min read
LW
link
Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo
Dec 4, 2023, 10:58 PM
37
points
0
comments
35
min read
LW
link
2023 Alignment Research Updates from FAR AI
AdamGleave
and
EuanMcLean
Dec 4, 2023, 10:32 PM
18
points
0
comments
8
min read
LW
link
(far.ai)
What’s new at FAR AI
AdamGleave
and
EuanMcLean
Dec 4, 2023, 9:18 PM
41
points
0
comments
5
min read
LW
link
(far.ai)
n of m ring signatures
DanielFilan
Dec 4, 2023, 8:00 PM
51
points
7
comments
1
min read
LW
link
(danielfilan.com)
Mechanistic interpretability through clustering
Alistair Fraser
Dec 4, 2023, 6:49 PM
1
point
0
comments
1
min read
LW
link
Agents which are EU-maximizing as a group are not EU-maximizing individually
Mlxa
Dec 4, 2023, 6:49 PM
3
points
2
comments
2
min read
LW
link
Planning in LLMs: Insights from AlphaGo
jco
Dec 4, 2023, 6:48 PM
8
points
10
comments
11
min read
LW
link
Non-classic stories about scheming (Section 2.3.2 of “Scheming AIs”)
Joe Carlsmith
Dec 4, 2023, 6:44 PM
9
points
0
comments
20
min read
LW
link
6. The Mutable Values Problem in Value Learning and CEV
RogerDearnaley
Dec 4, 2023, 6:31 PM
12
points
0
comments
49
min read
LW
link
Updates to Open Phil’s career development and transition funding program
abergal
and
Bastian Stern
Dec 4, 2023, 6:10 PM
28
points
0
comments
2
min read
LW
link
[Valence series] 1. Introduction
Steven Byrnes
Dec 4, 2023, 3:40 PM
99
points
16
comments
16
min read
LW
link
2
reviews
South Bay Meetup 12/9
David Friedman
Dec 4, 2023, 7:32 AM
2
points
0
comments
1
min read
LW
link
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
Paul Bricman
Dec 4, 2023, 7:31 AM
12
points
6
comments
16
min read
LW
link
(arxiv.org)
A call for a quantitative report card for AI bioterrorism threat models
Juno
Dec 4, 2023, 6:35 AM
12
points
0
comments
10
min read
LW
link
FTL travel summary
Isaac King
Dec 4, 2023, 5:17 AM
1
point
3
comments
3
min read
LW
link
Disappointing Table Refinishing
jefftk
Dec 4, 2023, 2:50 AM
14
points
3
comments
1
min read
LW
link
(www.jefftk.com)
the micro-fulfillment cambrian explosion
bhauth
Dec 4, 2023, 1:15 AM
54
points
5
comments
4
min read
LW
link
(www.bhauth.com)
Nietzsche’s Morality in Plain English
Arjun Panickssery
Dec 4, 2023, 12:57 AM
92
points
14
comments
4
min read
LW
link
1
review
(arjunpanickssery.substack.com)
Meditations on Mot
Richard_Ngo
Dec 4, 2023, 12:19 AM
56
points
11
comments
8
min read
LW
link
(www.mindthefuture.info)
The Witness
Richard_Ngo
Dec 3, 2023, 10:27 PM
105
points
5
comments
14
min read
LW
link
(www.narrativeark.xyz)
Does scheming lead to adequate future empowerment? (Section 2.3.1.2 of “Scheming AIs”)
Joe Carlsmith
Dec 3, 2023, 6:32 PM
9
points
0
comments
17
min read
LW
link
[Question]
How do you do post mortems?
matto
Dec 3, 2023, 2:46 PM
9
points
2
comments
1
min read
LW
link
The benefits and risks of optimism (about AI safety)
Karl von Wendt
Dec 3, 2023, 12:45 PM
−7
points
6
comments
5
min read
LW
link
Book Review: 1948 by Benny Morris
Yair Halberstadt
Dec 3, 2023, 10:29 AM
41
points
9
comments
12
min read
LW
link
Quick takes on “AI is easy to control”
So8res
Dec 2, 2023, 10:31 PM
26
points
49
comments
4
min read
LW
link
The goal-guarding hypothesis (Section 2.3.1.1 of “Scheming AIs”)
Joe Carlsmith
Dec 2, 2023, 3:20 PM
8
points
1
comment
15
min read
LW
link
The Method of Loci: With some brief remarks, including transformers and evaluating AIs
Bill Benzon
2 Dec 2023 14:36 UTC
6
points
0
comments
3
min read
LW
link
Taking Into Account Sentient Non-Humans in AI Ambitious Value Learning: Sentientist Coherent Extrapolated Volition
Adrià Moret
2 Dec 2023 14:07 UTC
26
points
31
comments
42
min read
LW
link
Out-of-distribution Bioattacks
jefftk
2 Dec 2023 12:20 UTC
66
points
15
comments
2
min read
LW
link
(www.jefftk.com)
After Alignment — Dialogue between RogerDearnaley and Seth Herd
RogerDearnaley
and
Seth Herd
2 Dec 2023 6:03 UTC
15
points
2
comments
25
min read
LW
link
List of strategies for mitigating deceptive alignment
joshc
2 Dec 2023 5:56 UTC
38
points
2
comments
6
min read
LW
link
[Question]
What is known about invariants in self-modifying systems?
mishka
2 Dec 2023 5:04 UTC
9
points
2
comments
1
min read
LW
link
2023 Unofficial LessWrong Census/Survey
Screwtape
2 Dec 2023 4:41 UTC
169
points
81
comments
1
min read
LW
link
Protecting against sudden capability jumps during training
Nikola Jurkovic
2 Dec 2023 4:22 UTC
15
points
2
comments
2
min read
LW
link
South Bay Pre-Holiday Gathering
IS
2 Dec 2023 3:21 UTC
10
points
2
comments
1
min read
LW
link
MATS Summer 2023 Retrospective
utilistrutil
,
Juan Gil
,
Ryan Kidd
,
Christian Smith
,
McKennaFitzgerald
and
LauraVaughan
1 Dec 2023 23:29 UTC
77
points
34
comments
26
min read
LW
link
Complex systems research as a field (and its relevance to AI Alignment)
Nora_Ammann
and
habryka
1 Dec 2023 22:10 UTC
65
points
11
comments
19
min read
LW
link
[Question]
Could there be “natural impact regularization” or “impact regularization by default”?
tailcalled
1 Dec 2023 22:01 UTC
24
points
6
comments
1
min read
LW
link
Benchmarking Bowtie2 Threading
jefftk
1 Dec 2023 20:20 UTC
9
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Please Bet On My Quantified Self Decision Markets
niplav
1 Dec 2023 20:07 UTC
36
points
6
comments
6
min read
LW
link
Specification Gaming: How AI Can Turn Your Wishes Against You [RA Video]
Writer
1 Dec 2023 19:30 UTC
19
points
0
comments
5
min read
LW
link
(youtu.be)
Carving up problems at their joints
Jakub Smékal
1 Dec 2023 18:48 UTC
1
point
0
comments
2
min read
LW
link
(jakubsmekal.com)
Queuing theory: Benefits of operating at 60% capacity
ampdot
1 Dec 2023 18:48 UTC
43
points
4
comments
1
min read
LW
link
(less.works)
Researchers and writers can apply for proxy access to the GPT-3.5 base model (code-davinci-002)
ampdot
1 Dec 2023 18:48 UTC
14
points
0
comments
1
min read
LW
link
(airtable.com)
Kolmogorov Complexity Lays Bare the Soul
jakej
1 Dec 2023 18:29 UTC
5
points
8
comments
2
min read
LW
link
Thoughts on “AI is easy to control” by Pope & Belrose
Steven Byrnes
1 Dec 2023 17:30 UTC
197
points
63
comments
14
min read
LW
link
1
review
Why Did NEPA Peak in 2016?
Maxwell Tabarrok
1 Dec 2023 16:18 UTC
10
points
0
comments
3
min read
LW
link
(maximumprogress.substack.com)
Worlds where I wouldn’t worry about AI risk
adekcz
1 Dec 2023 16:06 UTC
2
points
0
comments
4
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel