Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Value drift threat models
Garrett Baker
May 12, 2023, 11:03 PM
27
points
4
comments
5
min read
LW
link
Aggregating Utilities for Corrigible AI [Feedback Draft]
Dan H
and
Simon Goldstein
May 12, 2023, 8:57 PM
28
points
7
comments
22
min read
LW
link
Turning off lights with model editing
Sam Marks
May 12, 2023, 8:25 PM
68
points
5
comments
2
min read
LW
link
(arxiv.org)
Dark Forest Theories
Raemon
May 12, 2023, 8:21 PM
145
points
53
comments
2
min read
LW
link
2
reviews
DELBERTing as an Adversarial Strategy
Matthew_Opitz
May 12, 2023, 8:09 PM
8
points
3
comments
5
min read
LW
link
Microsoft/GitHub Copilot Chat’s confidential system Prompt: “You must refuse to discuss life, existence or sentience.”
Marvin von Hagen
May 12, 2023, 7:46 PM
13
points
2
comments
1
min read
LW
link
(twitter.com)
Retrospective: Lessons from the Failed Alignment Startup AISafety.com
Søren Elverlin
May 12, 2023, 6:07 PM
105
points
9
comments
3
min read
LW
link
The way AGI wins could look very stupid
Christopher King
May 12, 2023, 4:34 PM
54
points
22
comments
1
min read
LW
link
Towards Measures of Optimisation
mattmacdermott
and
Alexander Gietelink Oldenziel
May 12, 2023, 3:29 PM
53
points
37
comments
4
min read
LW
link
The Eden Project
rogersbacon
May 12, 2023, 2:58 PM
−1
points
1
comment
2
min read
LW
link
(www.secretorum.life)
Another formalization attempt: Central Argument That AGI Presents a Global Catastrophic Risk
avturchin
May 12, 2023, 1:22 PM
16
points
4
comments
2
min read
LW
link
Infinite-width MLPs as an “ensemble prior”
Vivek Hebbar
May 12, 2023, 11:45 AM
46
points
0
comments
5
min read
LW
link
Input Swap Graphs: Discovering the role of neural network components at scale
Alexandre Variengien
May 12, 2023, 9:41 AM
92
points
0
comments
33
min read
LW
link
Uploads are Impossible
PashaKamyshev
May 12, 2023, 8:03 AM
−5
points
37
comments
8
min read
LW
link
Formulating the AI Doom Argument for Analytic Philosophers
JonathanErhardt
May 12, 2023, 7:54 AM
13
points
0
comments
2
min read
LW
link
Three Iterative Processes
LoganStrohl
May 12, 2023, 2:50 AM
49
points
0
comments
3
min read
LW
link
Zuzalu LW Sequences Discussion
veronica
May 12, 2023, 12:14 AM
1
point
0
comments
1
min read
LW
link
[Question]
Term/Category for AI with Neutral Impact?
isomic
May 11, 2023, 10:00 PM
6
points
1
comment
1
min read
LW
link
Thoughts on LessWrong norms, the Art of Discourse, and moderator mandate
Ruby
May 11, 2023, 9:20 PM
37
points
20
comments
5
min read
LW
link
Alignment, Goals, and The Gut-Head Gap: A Review of Ngo. et al.
Violet Hour
May 11, 2023, 6:06 PM
20
points
2
comments
13
min read
LW
link
Sequence opener: Jordan Harbinger’s 6 minute networking
Severin T. Seehrich
May 11, 2023, 5:06 PM
4
points
0
comments
1
min read
LW
link
Advice for newly busy people
Severin T. Seehrich
May 11, 2023, 4:46 PM
150
points
3
comments
5
min read
LW
link
AI #11: In Search of a Moat
Zvi
May 11, 2023, 3:40 PM
67
points
28
comments
81
min read
LW
link
(thezvi.wordpress.com)
[Question]
Bayesian update from sensationalistic sources
houkime
May 11, 2023, 3:26 PM
1
point
0
comments
1
min read
LW
link
I bet $500 on AI winning the IMO gold medal by 2026
azsantosk
May 11, 2023, 2:46 PM
37
points
29
comments
1
min read
LW
link
Fatebook for Slack: Track your forecasts, right where your team works
Sage Future
and
Adam B
May 11, 2023, 2:11 PM
24
points
3
comments
1
min read
LW
link
Contra Caller Signs
jefftk
May 11, 2023, 1:10 PM
10
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Notes on the importance and implementation of safety-first cognitive architectures for AI
Brendon_Wong
May 11, 2023, 10:03 AM
3
points
0
comments
3
min read
LW
link
A more grounded idea of AI risk
Iknownothing
May 11, 2023, 9:48 AM
3
points
4
comments
1
min read
LW
link
Separating the “control problem” from the “alignment problem”
Yi-Yang
May 11, 2023, 9:41 AM
12
points
1
comment
4
min read
LW
link
[Question]
Is Infra-Bayesianism Applicable to Value Learning?
RogerDearnaley
May 11, 2023, 8:17 AM
5
points
4
comments
1
min read
LW
link
[Question]
How should we think about the decision relevance of models estimating p(doom)?
Mo Putera
May 11, 2023, 4:16 AM
11
points
1
comment
3
min read
LW
link
The Academic Field Pyramid—any point to encouraging broad but shallow AI risk engagement?
Matthew_Opitz
May 11, 2023, 1:32 AM
20
points
1
comment
6
min read
LW
link
[Question]
How should one feel morally about using chatbots?
Adam Zerner
May 11, 2023, 1:01 AM
18
points
4
comments
1
min read
LW
link
[Question]
AI interpretability could be harmful?
Roman Leventov
May 10, 2023, 8:43 PM
13
points
2
comments
1
min read
LW
link
Athens, Greece – ACX Meetups Everywhere Spring 2023
Spyros Dovas
May 10, 2023, 7:45 PM
1
point
0
comments
1
min read
LW
link
Better debates
TsviBT
May 10, 2023, 7:34 PM
78
points
7
comments
3
min read
LW
link
Mental Health and the Alignment Problem: A Compilation of Resources (updated April 2023)
Chris Scammell
and
DivineMango
May 10, 2023, 7:04 PM
256
points
54
comments
21
min read
LW
link
A Corrigibility Metaphore—Big Gambles
WCargo
May 10, 2023, 6:13 PM
16
points
0
comments
4
min read
LW
link
Roadmap for a collaborative prototype of an Open Agency Architecture
Deger Turan
May 10, 2023, 5:41 PM
31
points
0
comments
12
min read
LW
link
AGI-Automated Interpretability is Suicide
__RicG__
May 10, 2023, 2:20 PM
25
points
33
comments
7
min read
LW
link
Class-Based Addressing
jefftk
May 10, 2023, 1:40 PM
22
points
6
comments
1
min read
LW
link
(www.jefftk.com)
In defence of epistemic modesty [distillation]
Luise
May 10, 2023, 9:44 AM
17
points
2
comments
9
min read
LW
link
[Question]
How much of a concern are open-source LLMs in the short, medium and long terms?
JavierCC
May 10, 2023, 9:14 AM
5
points
0
comments
1
min read
LW
link
10 great reasons why Lex Fridman should invite Eliezer and Robin to re-do the FOOM debate on his podcast
chaosmage
May 10, 2023, 8:27 AM
−7
points
1
comment
1
min read
LW
link
(www.reddit.com)
New OpenAI Paper—Language models can explain neurons in language models
MrThink
May 10, 2023, 7:46 AM
47
points
14
comments
1
min read
LW
link
Naturalist Experimentation
LoganStrohl
May 10, 2023, 4:28 AM
62
points
14
comments
10
min read
LW
link
[Question]
Could A Superintelligence Out-Argue A Doomer?
tjaffee
10 May 2023 2:40 UTC
−16
points
6
comments
1
min read
LW
link
Gradient hacking via actual hacking
Max H
10 May 2023 1:57 UTC
12
points
7
comments
3
min read
LW
link
Red teaming: challenges and research directions
joshc
10 May 2023 1:40 UTC
31
points
1
comment
10
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel