Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Question]
For fun: How long can you hold your breath?
exanova
Dec 6, 2023, 11:36 PM
1
point
7
comments
1
min read
LW
link
Mathematics As Physics
Nox ML
Dec 6, 2023, 10:27 PM
−2
points
10
comments
5
min read
LW
link
The counting argument for scheming (Sections 4.1 and 4.2 of “Scheming AIs”)
Joe Carlsmith
Dec 6, 2023, 7:28 PM
10
points
0
comments
10
min read
LW
link
On Trust
johnswentworth
Dec 6, 2023, 7:19 PM
42
points
26
comments
4
min read
LW
link
Originality vs. Correctness
alkjash
and
habryka
Dec 6, 2023, 6:51 PM
60
points
17
comments
25
min read
LW
link
Proposal for improving the global online discourse through personalised comment ordering on all websites
Roman Leventov
Dec 6, 2023, 6:51 PM
35
points
21
comments
6
min read
LW
link
Google Gemini Announced
Jacob G-W
Dec 6, 2023, 4:14 PM
54
points
22
comments
1
min read
LW
link
(blog.google)
Based Beff Jezos and the Accelerationists
Zvi
Dec 6, 2023, 4:00 PM
90
points
29
comments
12
min read
LW
link
(thezvi.wordpress.com)
Bucket Brigade: Likely End-of-Life
jefftk
Dec 6, 2023, 3:30 PM
16
points
1
comment
1
min read
LW
link
(www.jefftk.com)
Why Yudkowsky is wrong about “covalently bonded equivalents of biology”
titotal
Dec 6, 2023, 2:09 PM
44
points
41
comments
LW
link
(open.substack.com)
Metaculus Launches Chinese AI Chips Tournament, Supporting Institute for AI Policy and Strategy Research
ChristianWilliams
Dec 6, 2023, 11:26 AM
10
points
1
comment
LW
link
(www.metaculus.com)
Minimal Viable Paradise: How do we get The Good Future(TM)?
Nathan Young
Dec 6, 2023, 9:24 AM
9
points
0
comments
7
min read
LW
link
Anthropical Paradoxes are Paradoxes of Probability Theory
Ape in the coat
Dec 6, 2023, 8:16 AM
55
points
18
comments
5
min read
LW
link
Digital humans vs merge with AI? Same or different?
Nathan Helm-Burger
and
mishka
Dec 6, 2023, 4:56 AM
21
points
11
comments
7
min read
LW
link
EA Infrastructure Fund’s Plan to Focus on Principles-First EA
Linch
Dec 6, 2023, 3:24 AM
27
points
0
comments
LW
link
**In defence of Helen Toner, Adam D’Angelo, and Tasha McCauley**
mrtreasure
Dec 6, 2023, 2:02 AM
25
points
3
comments
9
min read
LW
link
(pastebin.com)
Some quick thoughts on “AI is easy to control”
Mikhail Samin
Dec 6, 2023, 12:58 AM
15
points
10
comments
7
min read
LW
link
ACX Corvallis, OR
kenakofer
Dec 6, 2023, 12:23 AM
1
point
0
comments
1
min read
LW
link
Multinational corporations as optimizers: a case for reaching across the aisle
sudo-nym
Dec 6, 2023, 12:14 AM
9
points
10
comments
1
min read
LW
link
[Question]
How do you feel about LessWrong these days? [Open feedback thread]
Bird Concept
Dec 5, 2023, 8:54 PM
108
points
285
comments
1
min read
LW
link
Critique-a-Thon of AI Alignment Plans
Iknownothing
Dec 5, 2023, 8:50 PM
12
points
3
comments
1
min read
LW
link
Arguments for/against scheming that focus on the path SGD takes (Section 3 of “Scheming AIs”)
Joe Carlsmith
Dec 5, 2023, 6:48 PM
10
points
0
comments
23
min read
LW
link
In defence of Helen Toner, Adam D’Angelo, and Tasha McCauley (OpenAI post)
mrtreasure
Dec 5, 2023, 6:40 PM
6
points
2
comments
1
min read
LW
link
(pastebin.com)
Studying The Alien Mind
Quentin FEUILLADE--MONTIXI
and
NicholasKees
Dec 5, 2023, 5:27 PM
80
points
10
comments
15
min read
LW
link
Deep Forgetting & Unlearning for Safely-Scoped LLMs
scasper
Dec 5, 2023, 4:48 PM
126
points
30
comments
13
min read
LW
link
On ‘Responsible Scaling Policies’ (RSPs)
Zvi
Dec 5, 2023, 4:10 PM
48
points
3
comments
37
min read
LW
link
(thezvi.wordpress.com)
We’re all in this together
Tamsin Leake
Dec 5, 2023, 1:57 PM
69
points
65
comments
2
min read
LW
link
A Socratic dialogue with my student
lsusr
Dec 5, 2023, 9:31 AM
36
points
14
comments
6
min read
LW
link
Neural uncertainty estimation review article (for alignment)
Charlie Steiner
Dec 5, 2023, 8:01 AM
74
points
3
comments
11
min read
LW
link
Analyzing the Historical Rate of Catastrophes
jsteinhardt
Dec 5, 2023, 6:30 AM
15
points
0
comments
16
min read
LW
link
(bounded-regret.ghost.io)
Some open-source dictionaries and dictionary learning infrastructure
Sam Marks
Dec 5, 2023, 6:05 AM
46
points
7
comments
5
min read
LW
link
The LessWrong 2022 Review
habryka
Dec 5, 2023, 4:00 AM
115
points
43
comments
4
min read
LW
link
Bands And Low-stakes Dances
jefftk
Dec 5, 2023, 3:50 AM
20
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Accelerating science through evolvable institutions
jasoncrawford
Dec 4, 2023, 11:21 PM
19
points
9
comments
6
min read
LW
link
(rootsofprogress.org)
Speaking to Congressional staffers about AI risk
Orpheus16
and
hath
Dec 4, 2023, 11:08 PM
312
points
25
comments
15
min read
LW
link
1
review
Open Thread – Winter 2023/2024
habryka
Dec 4, 2023, 10:59 PM
35
points
160
comments
1
min read
LW
link
Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo
Dec 4, 2023, 10:58 PM
37
points
0
comments
35
min read
LW
link
2023 Alignment Research Updates from FAR AI
AdamGleave
and
EuanMcLean
Dec 4, 2023, 10:32 PM
18
points
0
comments
8
min read
LW
link
(far.ai)
What’s new at FAR AI
AdamGleave
and
EuanMcLean
Dec 4, 2023, 9:18 PM
41
points
0
comments
5
min read
LW
link
(far.ai)
n of m ring signatures
DanielFilan
Dec 4, 2023, 8:00 PM
51
points
7
comments
1
min read
LW
link
(danielfilan.com)
Mechanistic interpretability through clustering
Alistair Fraser
Dec 4, 2023, 6:49 PM
1
point
0
comments
1
min read
LW
link
Agents which are EU-maximizing as a group are not EU-maximizing individually
Mlxa
Dec 4, 2023, 6:49 PM
3
points
2
comments
2
min read
LW
link
Planning in LLMs: Insights from AlphaGo
jco
Dec 4, 2023, 6:48 PM
8
points
10
comments
11
min read
LW
link
Non-classic stories about scheming (Section 2.3.2 of “Scheming AIs”)
Joe Carlsmith
Dec 4, 2023, 6:44 PM
9
points
0
comments
20
min read
LW
link
6. The Mutable Values Problem in Value Learning and CEV
RogerDearnaley
Dec 4, 2023, 6:31 PM
12
points
0
comments
49
min read
LW
link
Updates to Open Phil’s career development and transition funding program
abergal
and
Bastian Stern
Dec 4, 2023, 6:10 PM
28
points
0
comments
2
min read
LW
link
[Valence series] 1. Introduction
Steven Byrnes
Dec 4, 2023, 3:40 PM
99
points
16
comments
16
min read
LW
link
2
reviews
South Bay Meetup 12/9
David Friedman
Dec 4, 2023, 7:32 AM
2
points
0
comments
1
min read
LW
link
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
Paul Bricman
Dec 4, 2023, 7:31 AM
12
points
6
comments
16
min read
LW
link
(arxiv.org)
A call for a quantitative report card for AI bioterrorism threat models
Juno
Dec 4, 2023, 6:35 AM
12
points
0
comments
10
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel