Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Worlds where I wouldn’t worry about AI risk
adekcz
Dec 1, 2023, 4:06 PM
2
points
0
comments
4
min read
LW
link
How useful for alignment-relevant work are AIs with short-term goals? (Section 2.2.4.3 of “Scheming AIs”)
Joe Carlsmith
Dec 1, 2023, 2:51 PM
10
points
1
comment
7
min read
LW
link
Reality is whatever you can get away with.
sometimesperson
Dec 1, 2023, 7:50 AM
−5
points
0
comments
1
min read
LW
link
Reinforcement Learning using Layered Morphology (RLLM)
MiguelDev
Dec 1, 2023, 5:18 AM
7
points
0
comments
29
min read
LW
link
[Question]
Is OpenAI losing money on each request?
thenoviceoof
Dec 1, 2023, 3:27 AM
8
points
8
comments
5
min read
LW
link
How useful is mechanistic interpretability?
ryan_greenblatt
,
Neel Nanda
,
Buck
and
habryka
Dec 1, 2023, 2:54 AM
167
points
54
comments
25
min read
LW
link
FixDT
abramdemski
Nov 30, 2023, 9:57 PM
64
points
15
comments
14
min read
LW
link
1
review
Generalization, from thermodynamics to statistical physics
Jesse Hoogland
Nov 30, 2023, 9:28 PM
64
points
9
comments
28
min read
LW
link
What’s next for the field of Agent Foundations?
Nora_Ammann
,
Alexander Gietelink Oldenziel
and
mattmacdermott
Nov 30, 2023, 5:55 PM
59
points
23
comments
10
min read
LW
link
A Proposed Cure for Alzheimer’s Disease???
MadHatter
Nov 30, 2023, 5:37 PM
4
points
30
comments
2
min read
LW
link
AI #40: A Vision from Vitalik
Zvi
Nov 30, 2023, 5:30 PM
53
points
12
comments
42
min read
LW
link
(thezvi.wordpress.com)
Is scheming more likely in models trained to have long-term goals? (Sections 2.2.4.1-2.2.4.2 of “Scheming AIs”)
Joe Carlsmith
Nov 30, 2023, 4:43 PM
8
points
0
comments
6
min read
LW
link
A Formula for Violence (and Its Antidote)
MadHatter
Nov 30, 2023, 4:04 PM
−22
points
6
comments
1
min read
LW
link
(blog.simpleheart.org)
Enkrateia: a safe model-based reinforcement learning algorithm
MadHatter
Nov 30, 2023, 3:51 PM
−15
points
4
comments
2
min read
LW
link
(github.com)
Normative Ethics vs Utilitarianism
Logan Zoellner
Nov 30, 2023, 3:36 PM
6
points
0
comments
2
min read
LW
link
(midwitalignment.substack.com)
Information-Theoretic Boxing of Superintelligences
JustinShovelain
and
Elliot Mckernon
Nov 30, 2023, 2:31 PM
30
points
0
comments
7
min read
LW
link
OpenAI: Altman Returns
Zvi
Nov 30, 2023, 2:10 PM
66
points
12
comments
11
min read
LW
link
(thezvi.wordpress.com)
[Linkpost] Remarks on the Convergence in Distribution of Random Neural Networks to Gaussian Processes in the Infinite Width Limit
carboniferous_umbraculum
Nov 30, 2023, 2:01 PM
9
points
0
comments
1
min read
LW
link
(drive.google.com)
[Question]
Buy Nothing Day is a great idea with a terrible app— why has nobody built a killer app for crowdsourced ‘effective communism’ yet?
lillybaeum
Nov 30, 2023, 1:47 PM
8
points
17
comments
1
min read
LW
link
[Question]
Comprehensible Input is the only way people learn languages—is it the only way people *learn*?
lillybaeum
Nov 30, 2023, 1:31 PM
8
points
2
comments
3
min read
LW
link
Some Intuitions for the Ethicophysics
MadHatter
and
mishka
Nov 30, 2023, 6:47 AM
2
points
4
comments
8
min read
LW
link
The Alignment Agenda THEY Don’t Want You to Know About
MadHatter
Nov 30, 2023, 4:29 AM
−19
points
16
comments
1
min read
LW
link
Cis fragility
[deactivated]
Nov 30, 2023, 4:14 AM
−51
points
9
comments
3
min read
LW
link
Homework Answer: Glicko Ratings for War
MadHatter
Nov 30, 2023, 4:08 AM
−45
points
1
comment
77
min read
LW
link
(gist.github.com)
[Question]
Feature Request for LessWrong
MadHatter
Nov 30, 2023, 3:19 AM
11
points
8
comments
1
min read
LW
link
My Alignment Research Agenda (“the Ethicophysics”)
MadHatter
Nov 30, 2023, 2:57 AM
−13
points
0
comments
1
min read
LW
link
[Question]
Stupid Question: Why am I getting consistently downvoted?
MadHatter
Nov 30, 2023, 12:21 AM
31
points
138
comments
1
min read
LW
link
Inositol Non-Results
Elizabeth
Nov 29, 2023, 9:40 PM
20
points
2
comments
1
min read
LW
link
(acesounderglass.com)
Losing Metaphors: Zip and Paste
jefftk
Nov 29, 2023, 8:31 PM
26
points
6
comments
1
min read
LW
link
(www.jefftk.com)
Preserving our heritage: Building a movement and a knowledge ark for current and future generations
rnk8
Nov 29, 2023, 7:20 PM
0
points
5
comments
12
min read
LW
link
AGI Alignment is Absurd
Youssef Mohamed
Nov 29, 2023, 7:11 PM
−9
points
4
comments
3
min read
LW
link
The origins of the steam engine: An essay with interactive animated diagrams
jasoncrawford
Nov 29, 2023, 6:30 PM
30
points
1
comment
1
min read
LW
link
(rootsofprogress.org)
ChatGPT 4 solved all the gotcha problems I posed that tripped ChatGPT 3.5
VipulNaik
Nov 29, 2023, 6:11 PM
33
points
16
comments
14
min read
LW
link
“Clean” vs. “messy” goal-directedness (Section 2.2.3 of “Scheming AIs”)
Joe Carlsmith
Nov 29, 2023, 4:32 PM
29
points
1
comment
11
min read
LW
link
Lying Alignment Chart
Zack_M_Davis
Nov 29, 2023, 4:15 PM
77
points
17
comments
1
min read
LW
link
Rethink Priorities: Seeking Expressions of Interest for Special Projects Next Year
kierangreig
Nov 29, 2023, 1:59 PM
4
points
0
comments
5
min read
LW
link
[Question]
Thoughts on teletransportation with copies?
titotal
Nov 29, 2023, 12:56 PM
15
points
13
comments
1
min read
LW
link
Interpretability with Sparse Autoencoders (Colab exercises)
CallumMcDougall
Nov 29, 2023, 12:56 PM
76
points
9
comments
4
min read
LW
link
The 101 Space You Will Always Have With You
Screwtape
Nov 29, 2023, 4:56 AM
277
points
23
comments
6
min read
LW
link
1
review
Trust your intuition—Kahneman’s book misses the forest for the trees
mnvr
Nov 29, 2023, 4:37 AM
−2
points
2
comments
2
min read
LW
link
Process Substitution Without Shell?
jefftk
Nov 29, 2023, 3:20 AM
19
points
18
comments
2
min read
LW
link
(www.jefftk.com)
Deception Chess: Game #2
Zane
Nov 29, 2023, 2:43 AM
29
points
17
comments
2
min read
LW
link
Black Box Biology
GeneSmith
Nov 29, 2023, 2:27 AM
65
points
30
comments
2
min read
LW
link
[Question]
What would be the shelf life of nuclear weapon-secrecy if nuclear weapons had not immediately been used in combat?
Gram Stone
Nov 29, 2023, 12:53 AM
7
points
2
comments
1
min read
LW
link
Scaling laws for dominant assurance contracts
jessicata
Nov 28, 2023, 11:11 PM
36
points
5
comments
7
min read
LW
link
(unstableontology.com)
I’m confused about innate smell neuroanatomy
Steven Byrnes
Nov 28, 2023, 8:49 PM
40
points
2
comments
9
min read
LW
link
How to Control an LLM’s Behavior (why my P(DOOM) went down)
RogerDearnaley
Nov 28, 2023, 7:56 PM
65
points
30
comments
11
min read
LW
link
[Question]
Is there a word for discrimination against A.I.?
Aaron Bohannon
Nov 28, 2023, 7:03 PM
1
point
4
comments
1
min read
LW
link
Update #2 to “Dominant Assurance Contract Platform”: EnsureDone
moyamo
Nov 28, 2023, 6:02 PM
33
points
2
comments
1
min read
LW
link
Ethicophysics II: Politics is the Mind-Savior
MadHatter
Nov 28, 2023, 4:27 PM
−9
points
9
comments
4
min read
LW
link
(bittertruths.substack.com)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel