Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
Soviet comedy film recommendations
Nina Panickssery
Jun 9, 2024, 11:40 PM
42
points
11
comments
2
min read
LW
link
(open.substack.com)
The Data Wall is Important
JustisMills
Jun 9, 2024, 10:54 PM
40
points
20
comments
2
min read
LW
link
(justismills.substack.com)
Two Family Dance Flyers
jefftk
Jun 9, 2024, 8:50 PM
13
points
0
comments
1
min read
LW
link
(www.jefftk.com)
[Question]
What happens to existing life sentences under LEV?
O O
Jun 9, 2024, 5:49 PM
5
points
7
comments
1
min read
LW
link
3b. Formal (Faux) Corrigibility
Max Harms
Jun 9, 2024, 5:18 PM
26
points
13
comments
17
min read
LW
link
3a. Towards Formal Corrigibility
Max Harms
Jun 9, 2024, 4:53 PM
24
points
2
comments
19
min read
LW
link
Introducing SARA: a new activation steering technique
Alejandro Tlaie
Jun 9, 2024, 3:33 PM
17
points
7
comments
6
min read
LW
link
“What the hell is a representation, anyway?” | Clarifying AI interpretability with tools from philosophy of cognitive science | Part 1: Vehicles vs. contents
IwanWilliams
Jun 9, 2024, 2:19 PM
9
points
1
comment
4
min read
LW
link
Exploring Llama-3-8B MLP Neurons
ntt123
Jun 9, 2024, 2:19 PM
10
points
0
comments
4
min read
LW
link
(neuralblog.github.io)
Demystifying “Alignment” through a Comic
milanrosko
Jun 9, 2024, 8:24 AM
106
points
19
comments
1
min read
LW
link
Dumbing down
Martin Sustrik
Jun 9, 2024, 6:50 AM
72
points
1
comment
4
min read
LW
link
What if a tech company forced you to move to NYC?
KatjaGrace
Jun 9, 2024, 6:30 AM
56
points
22
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
[Question]
What should I do? (long term plan about starting an AI lab)
not_a_cat
Jun 9, 2024, 12:45 AM
2
points
1
comment
2
min read
LW
link
Searching for the Root of the Tree of Evil
Ivan Vendrov
Jun 8, 2024, 5:05 PM
36
points
14
comments
5
min read
LW
link
(nothinghuman.substack.com)
2. Corrigibility Intuition
Max Harms
Jun 8, 2024, 3:52 PM
67
points
10
comments
33
min read
LW
link
Two easy things that maybe Just Work to improve AI discourse
Bird Concept
Jun 8, 2024, 3:51 PM
191
points
35
comments
2
min read
LW
link
I made an AI safety fellowship. What I wish I knew.
Ruben Castaing
Jun 8, 2024, 3:23 PM
12
points
0
comments
2
min read
LW
link
Alignment Gaps
kcyras
Jun 8, 2024, 3:23 PM
11
points
4
comments
8
min read
LW
link
The Slack Double Crux, or how to negotiate with yourself
Thac0
Jun 8, 2024, 3:22 PM
6
points
2
comments
4
min read
LW
link
The Perils of Popularity: A Critical Examination of LessWrong’s Rational Discourse
BubbaJoeLouis
Jun 8, 2024, 3:22 PM
−24
points
3
comments
2
min read
LW
link
Status quo bias is usually justified
Amadeus Pagel
Jun 8, 2024, 2:54 PM
10
points
3
comments
1
min read
LW
link
(amadeuspagel.substack.com)
Closed-Source Evaluations
Jono
Jun 8, 2024, 2:18 PM
15
points
4
comments
1
min read
LW
link
Access to powerful AI might make computer security radically easier
Buck
Jun 8, 2024, 6:00 AM
105
points
14
comments
6
min read
LW
link
[Question]
Why don’t we just get rid of all the bioethicists?
Sable
Jun 8, 2024, 3:48 AM
13
points
0
comments
1
min read
LW
link
Sev, Sevteen, Sevty, Sevth
jefftk
Jun 8, 2024, 2:30 AM
17
points
9
comments
1
min read
LW
link
(www.jefftk.com)
1. The CAST Strategy
Max Harms
Jun 7, 2024, 10:29 PM
48
points
22
comments
38
min read
LW
link
0. CAST: Corrigibility as Singular Target
Max Harms
Jun 7, 2024, 10:29 PM
147
points
17
comments
8
min read
LW
link
What is space? What is time?
Tahp
Jun 7, 2024, 10:15 PM
8
points
3
comments
7
min read
LW
link
[Question]
Question about Lewis’ counterfactual theory of causation
jbkjr
Jun 7, 2024, 8:15 PM
12
points
7
comments
1
min read
LW
link
Relationships among words, metalingual definition, and interpretability
Bill Benzon
Jun 7, 2024, 7:18 PM
2
points
0
comments
5
min read
LW
link
Let’s Talk About Emergence
jacobhaimes
Jun 7, 2024, 7:18 PM
4
points
0
comments
7
min read
LW
link
(www.odysseaninstitute.org)
D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer
Jun 7, 2024, 7:02 PM
42
points
16
comments
3
min read
LW
link
Natural Latents Are Not Robust To Tiny Mixtures
johnswentworth
and
David Lorell
Jun 7, 2024, 6:53 PM
61
points
8
comments
5
min read
LW
link
Situational Awareness Summarized—Part 2
Joe Rogero
Jun 7, 2024, 5:20 PM
12
points
2
comments
4
min read
LW
link
Frida van Lisa, a short story about adversarial AI attacks on humans
arisAlexis
Jun 7, 2024, 1:22 PM
2
points
0
comments
18
min read
LW
link
Quotes from Leopold Aschenbrenner’s Situational Awareness Paper
Zvi
Jun 7, 2024, 11:40 AM
97
points
10
comments
37
min read
LW
link
(thezvi.wordpress.com)
LessWrong/ACX meetup Transilvanya tour—Cluj Napoca
Marius Adrian Nicoară
Jun 7, 2024, 5:45 AM
1
point
1
comment
1
min read
LW
link
Is Claude a mystic?
jessicata
Jun 7, 2024, 4:27 AM
60
points
23
comments
13
min read
LW
link
(unstablerontology.substack.com)
Offering Completion
jefftk
Jun 7, 2024, 1:40 AM
29
points
6
comments
1
min read
LW
link
(www.jefftk.com)
A Case for Superhuman Governance, using AI
ozziegooen
Jun 7, 2024, 12:10 AM
30
points
0
comments
LW
link
Memorizing weak examples can elicit strong behavior out of password-locked models
Fabien Roger
and
ryan_greenblatt
Jun 6, 2024, 11:54 PM
58
points
5
comments
7
min read
LW
link
Response to Aschenbrenner’s “Situational Awareness”
Rob Bensinger
Jun 6, 2024, 10:57 PM
194
points
27
comments
3
min read
LW
link
Scaling and evaluating sparse autoencoders
leogao
Jun 6, 2024, 10:50 PM
106
points
6
comments
1
min read
LW
link
Humming is not a free $100 bill
Elizabeth
Jun 6, 2024, 8:10 PM
185
points
6
comments
3
min read
LW
link
(acesounderglass.com)
There Are No Primordial Definitions of Man/Woman
ymeskhout
Jun 6, 2024, 7:30 PM
11
points
0
comments
4
min read
LW
link
(ymeskhout.substack.com)
Situational Awareness Summarized—Part 1
Joe Rogero
Jun 6, 2024, 6:59 PM
21
points
0
comments
5
min read
LW
link
[Link Post] “Foundational Challenges in Assuring Alignment and Safety of Large Language Models”
David Scott Krueger (formerly: capybaralet)
Jun 6, 2024, 6:55 PM
70
points
2
comments
6
min read
LW
link
(llm-safety-challenges.github.io)
AI #67: Brief Strange Trip
Zvi
Jun 6, 2024, 6:50 PM
49
points
6
comments
40
min read
LW
link
(thezvi.wordpress.com)
The Human Biological Advantage Over AI
Wstewart
Jun 6, 2024, 6:18 PM
−13
points
2
comments
1
min read
LW
link
An evaluation of Helen Toner’s interview on the TED AI Show
PeterH
Jun 6, 2024, 5:39 PM
24
points
2
comments
30
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel