Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
2
Corrigibility = Tool-ness?
johnswentworth
and
David Lorell
Jun 28, 2024, 1:19 AM
78
points
8
comments
9
min read
LW
link
[New Feature] Your Subscribed Feed
Ruby
and
RobertM
Jun 11, 2024, 10:45 PM
77
points
13
comments
4
min read
LW
link
Claude 3.5 Sonnet
Zach Stein-Perlman
Jun 20, 2024, 6:00 PM
75
points
41
comments
1
min read
LW
link
(www.anthropic.com)
(Not) Derailing the LessOnline Puzzle Hunt
Error
Jun 4, 2024, 1:28 AM
74
points
2
comments
4
min read
LW
link
MIRI’s June 2024 Newsletter
Harlan
Jun 14, 2024, 11:02 PM
74
points
20
comments
2
min read
LW
link
(intelligence.org)
Mistakes people make when thinking about units
Isaac King
Jun 25, 2024, 3:39 AM
74
points
14
comments
7
min read
LW
link
Companies’ safety plans neglect risks from scheming AI
Zach Stein-Perlman
Jun 3, 2024, 3:00 PM
73
points
4
comments
6
min read
LW
link
Dumbing down
Martin Sustrik
Jun 9, 2024, 6:50 AM
72
points
1
comment
4
min read
LW
link
Shard Theory—is it true for humans?
Rishika
Jun 14, 2024, 7:21 PM
71
points
7
comments
15
min read
LW
link
[Link Post] “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”
David Scott Krueger (formerly: capybaralet)
Jun 6, 2024, 6:55 PM
70
points
2
comments
6
min read
LW
link
(llm-safety-challenges.github.io)
Former OpenAI Superalignment Researcher: Superintelligence by 2030
Julian Bradshaw
Jun 5, 2024, 3:35 AM
70
points
30
comments
1
min read
LW
link
(situational-awareness.ai)
Different senses in which two AIs can be “the same”
Vivek Hebbar
and
Buck
Jun 24, 2024, 3:16 AM
69
points
2
comments
4
min read
LW
link
2. Corrigibility Intuition
Max Harms
Jun 8, 2024, 3:52 PM
67
points
10
comments
33
min read
LW
link
SB 1047 Is Weakened
Zvi
Jun 6, 2024, 1:40 PM
67
points
4
comments
9
min read
LW
link
(thezvi.wordpress.com)
Interpreting and Steering Features in Images
Gytis Daujotas
Jun 20, 2024, 6:33 PM
66
points
6
comments
5
min read
LW
link
AI #69: Nice
Zvi
Jun 20, 2024, 12:40 PM
65
points
9
comments
51
min read
LW
link
(thezvi.wordpress.com)
How a chip is designed
YM
Jun 28, 2024, 8:04 AM
65
points
4
comments
5
min read
LW
link
AiPhone
Zvi
Jun 12, 2024, 10:20 PM
63
points
4
comments
14
min read
LW
link
(thezvi.wordpress.com)
“Metastrategic Brainstorming”, a core building-block skill
Raemon
Jun 11, 2024, 4:27 AM
63
points
5
comments
6
min read
LW
link
What is a Tool?
johnswentworth
and
David Lorell
Jun 25, 2024, 11:40 PM
62
points
4
comments
6
min read
LW
link
Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth
and
David Lorell
Jun 7, 2024, 6:53 PM
61
points
8
comments
5
min read
LW
link
Is Claude a mystic?
jessicata
Jun 7, 2024, 4:27 AM
60
points
23
comments
13
min read
LW
link
(unstablerontology.substack.com)
microwave drilling is impractical
bhauth
Jun 12, 2024, 10:16 PM
59
points
19
comments
4
min read
LW
link
(www.bhauth.com)
Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger
and
ryan_greenblatt
Jun 6, 2024, 11:54 PM
58
points
5
comments
7
min read
LW
link
Datasets that change the odds you exist
dynomight
Jun 29, 2024, 6:45 PM
56
points
4
comments
6
min read
LW
link
(dynomight.net)
Degeneracies are sticky for SGD
Guillaume Corlouer
and
Nicolas Macé
Jun 16, 2024, 9:19 PM
56
points
1
comment
16
min read
LW
link
What if a tech company forced you to move to NYC?
KatjaGrace
Jun 9, 2024, 6:30 AM
56
points
22
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
Calculating Natural Latents via Resampling
johnswentworth
and
David Lorell
Jun 6, 2024, 12:37 AM
55
points
4
comments
10
min read
LW
link
4. Existing Writing on Corrigibility
Max Harms
Jun 10, 2024, 2:08 PM
55
points
15
comments
106
min read
LW
link
On “first critical tries” in AI alignment
Joe Carlsmith
Jun 5, 2024, 12:19 AM
54
points
8
comments
14
min read
LW
link
Fat Tails Discourage Compromise
niplav
Jun 17, 2024, 9:39 AM
53
points
5
comments
1
min read
LW
link
Book Review: Righteous Victims—A History of the Zionist-Arab Conflict
Yair Halberstadt
Jun 24, 2024, 11:02 AM
53
points
8
comments
34
min read
LW
link
Schelling points in the AGI policy space
mesaoptimizer
Jun 26, 2024, 1:19 PM
52
points
2
comments
6
min read
LW
link
Two LessWrong speed friending experiments
mikko
and
sanyer
Jun 15, 2024, 10:52 AM
52
points
3
comments
4
min read
LW
link
So you want to work on technical AI safety
gw
Jun 24, 2024, 2:29 PM
51
points
3
comments
14
min read
LW
link
Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke
and
Shoshannah Tekofsky
Jun 22, 2024, 7:53 AM
51
points
0
comments
1
min read
LW
link
(kidquest.substack.com)
D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues Evaluation & Ruleset
aphyer
Jun 17, 2024, 9:29 PM
51
points
11
comments
6
min read
LW
link
how birds sense magnetic fields
bhauth
Jun 27, 2024, 6:59 PM
51
points
4
comments
5
min read
LW
link
(www.bhauth.com)
Philosophers wrestling with evil, as a social media feed
David Gross
Jun 3, 2024, 10:25 PM
51
points
2
comments
16
min read
LW
link
An issue with training schemers with supervised fine-tuning
Fabien Roger
Jun 27, 2024, 3:37 PM
49
points
12
comments
6
min read
LW
link
AI #67: Brief Strange Trip
Zvi
Jun 6, 2024, 6:50 PM
49
points
6
comments
40
min read
LW
link
(thezvi.wordpress.com)
in defense of Linus Pauling
bhauth
Jun 3, 2024, 9:27 PM
49
points
8
comments
2
min read
LW
link
(www.bhauth.com)
Contra Acemoglu on AI
Maxwell Tabarrok
Jun 28, 2024, 1:13 PM
48
points
0
comments
5
min read
LW
link
(www.maximum-progress.com)
[Valence series] 4. Valence & Liking / Admiring
Steven Byrnes
Jun 10, 2024, 2:19 PM
48
points
12
comments
15
min read
LW
link
What distinguishes “early”, “mid” and “end” games?
Raemon
Jun 21, 2024, 5:41 PM
48
points
22
comments
1
min read
LW
link
1. The CAST Strategy
Max Harms
Jun 7, 2024, 10:29 PM
48
points
22
comments
38
min read
LW
link
On OpenAI’s Model Spec
Zvi
21 Jun 2024 13:00 UTC
47
points
4
comments
30
min read
LW
link
(thezvi.wordpress.com)
Enriched tab is now the default LW Frontpage experience for logged-in users
Ruby
and
RobertM
21 Jun 2024 0:09 UTC
46
points
27
comments
3
min read
LW
link
AI #68: Remarkably Reasonable Reactions
Zvi
13 Jun 2024 16:30 UTC
46
points
11
comments
50
min read
LW
link
(thezvi.wordpress.com)
Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu
25 Jun 2024 1:35 UTC
46
points
9
comments
3
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel