Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Standard SAEs Might Be Incoherent: A Choosing Problem & A “Concise” Solution
Kola Ayonrinde
Oct 30, 2024, 10:50 PM
27
points
0
comments
12
min read
LW
link
Generic advice caveats
Saul Munn
Oct 30, 2024, 9:03 PM
27
points
1
comment
3
min read
LW
link
(www.brasstacks.blog)
I turned decision theory problems into memes about trolleys
Tapatakt
Oct 30, 2024, 8:13 PM
104
points
23
comments
1
min read
LW
link
AI as a powerful meme, via CGP Grey
TheManxLoiner
Oct 30, 2024, 6:31 PM
46
points
8
comments
4
min read
LW
link
[Question]
How might language influence how an AI “thinks”?
bodry
Oct 30, 2024, 5:41 PM
3
points
0
comments
1
min read
LW
link
Motivation control
Joe Carlsmith
Oct 30, 2024, 5:15 PM
45
points
7
comments
52
min read
LW
link
Updating the NAO Simulator
jefftk
Oct 30, 2024, 1:50 PM
11
points
0
comments
2
min read
LW
link
(www.jefftk.com)
Occupational Licensing Roundup #1
Zvi
Oct 30, 2024, 11:00 AM
65
points
11
comments
11
min read
LW
link
(thezvi.wordpress.com)
Three Notions of “Power”
johnswentworth
Oct 30, 2024, 6:10 AM
92
points
44
comments
4
min read
LW
link
Introduction to Choice set Misspecification in Reward Inference
Rahul Chand
Oct 29, 2024, 10:57 PM
1
point
0
comments
8
min read
LW
link
Gothenburg LW/ACX meetup
Stefan
Oct 29, 2024, 8:40 PM
2
points
0
comments
1
min read
LW
link
The Alignment Trap: AI Safety as Path to Power
crispweed
Oct 29, 2024, 3:21 PM
57
points
17
comments
5
min read
LW
link
(upcoder.com)
Housing Roundup #10
Zvi
Oct 29, 2024, 1:50 PM
32
points
2
comments
32
min read
LW
link
(thezvi.wordpress.com)
[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes
Oct 29, 2024, 1:36 PM
51
points
2
comments
16
min read
LW
link
Review: “The Case Against Reality”
David Gross
Oct 29, 2024, 1:13 PM
20
points
9
comments
5
min read
LW
link
A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
Sharat Jacob Jacob
Oct 29, 2024, 12:41 PM
12
points
0
comments
9
min read
LW
link
Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean
Oct 29, 2024, 12:16 PM
45
points
9
comments
26
min read
LW
link
AI #87: Staying in Character
Zvi
Oct 29, 2024, 7:10 AM
57
points
3
comments
33
min read
LW
link
(thezvi.wordpress.com)
A path to human autonomy
Nathan Helm-Burger
Oct 29, 2024, 3:02 AM
53
points
16
comments
20
min read
LW
link
D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer
Oct 29, 2024, 1:21 AM
47
points
13
comments
6
min read
LW
link
Gwern: Why So Few Matt Levines?
kave
Oct 29, 2024, 1:07 AM
78
points
10
comments
1
min read
LW
link
(gwern.net)
October 2024 Progress in Guaranteed Safe AI
Quinn
Oct 28, 2024, 11:34 PM
7
points
0
comments
1
min read
LW
link
(gsai.substack.com)
5 homegrown EA projects, seeking small donors
Austin Chen
Oct 28, 2024, 11:24 PM
85
points
4
comments
LW
link
How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith
Oct 28, 2024, 9:57 PM
54
points
5
comments
32
min read
LW
link
Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen
Oct 28, 2024, 9:44 PM
7
points
0
comments
LW
link
AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L
Oct 28, 2024, 9:08 PM
18
points
0
comments
14
min read
LW
link
(rudolf.website)
AI & wisdom 2: growth and amortised optimisation
L Rudolf L
Oct 28, 2024, 9:07 PM
18
points
0
comments
8
min read
LW
link
(rudolf.website)
AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L
Oct 28, 2024, 9:02 PM
29
points
0
comments
15
min read
LW
link
(rudolf.website)
Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi
Oct 28, 2024, 8:17 PM
94
points
7
comments
4
min read
LW
link
(manifund.org)
Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis
Oct 28, 2024, 7:45 PM
20
points
2
comments
33
min read
LW
link
(aiimpacts.org)
Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen
Oct 28, 2024, 6:39 PM
7
points
0
comments
1
min read
LW
link
Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb
and
AI Impacts
Oct 28, 2024, 5:10 PM
40
points
3
comments
30
min read
LW
link
(blog.aiimpacts.org)
Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority
Zach Stein-Perlman
Oct 28, 2024, 5:00 PM
22
points
4
comments
3
min read
LW
link
(milesbrundage.substack.com)
[Question]
somebody explain the word “epistemic” to me
KvmanThinking
Oct 28, 2024, 4:40 PM
7
points
8
comments
1
min read
LW
link
~80 Interesting Questions about Foundation Model Agent Safety
RohanS
and
Govind Pimpale
Oct 28, 2024, 4:37 PM
46
points
4
comments
15
min read
LW
link
AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke
,
Corin Katzke
,
Alexa Pan
and
Dan H
Oct 28, 2024, 4:03 PM
6
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
Death notes − 7 thoughts on death
Nathan Young
Oct 28, 2024, 3:01 PM
26
points
1
comment
5
min read
LW
link
(nathanpmyoung.substack.com)
SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI
Oct 28, 2024, 2:48 PM
27
points
0
comments
10
min read
LW
link
Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph
Oct 28, 2024, 2:41 PM
19
points
5
comments
15
min read
LW
link
How Likely Are Various Precursors of Existential Risk?
NunoSempere
Oct 28, 2024, 1:27 PM
55
points
4
comments
15
min read
LW
link
(blog.sentinel-team.org)
Care Doesn’t Scale
stavros
Oct 28, 2024, 11:57 AM
27
points
1
comment
1
min read
LW
link
(stevenscrawls.com)
Your memory eventually drives confidence in each hypothesis to 1 or 0
Crazy philosopher
Oct 28, 2024, 9:00 AM
3
points
6
comments
1
min read
LW
link
Nerdtrition: simple diets via spreadsheet abuse
dkl9
27 Oct 2024 21:45 UTC
8
points
0
comments
3
min read
LW
link
(dkl9.net)
AGI Fermi Paradox
jrincayc
27 Oct 2024 20:14 UTC
0
points
2
comments
2
min read
LW
link
Substituting Talkbox for Breath Controller
jefftk
27 Oct 2024 19:10 UTC
11
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
27 Oct 2024 18:46 UTC
48
points
4
comments
5
min read
LW
link
Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg
27 Oct 2024 17:34 UTC
16
points
0
comments
LW
link
Interview with Bill O’Rourke—Russian Corruption, Putin, Applied Ethics, and More
JohnGreer
27 Oct 2024 17:11 UTC
3
points
0
comments
6
min read
LW
link
On Shifgrethor
JustisMills
27 Oct 2024 15:30 UTC
67
points
18
comments
2
min read
LW
link
(justismills.substack.com)
The hostile telepaths problem
Valentine
27 Oct 2024 15:26 UTC
383
points
89
comments
15
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel