Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Introduction to Choice set Misspecification in Reward Inference
Rahul Chand
Oct 29, 2024, 10:57 PM
1
point
0
comments
8
min read
LW
link
Gothenburg LW/ACX meetup
Stefan
Oct 29, 2024, 8:40 PM
2
points
0
comments
1
min read
LW
link
The Alignment Trap: AI Safety as Path to Power
crispweed
Oct 29, 2024, 3:21 PM
57
points
17
comments
5
min read
LW
link
(upcoder.com)
Housing Roundup #10
Zvi
Oct 29, 2024, 1:50 PM
32
points
2
comments
32
min read
LW
link
(thezvi.wordpress.com)
[Intuitive self-models] 7. Hearing Voices, and Other Hallucinations
Steven Byrnes
Oct 29, 2024, 1:36 PM
51
points
2
comments
16
min read
LW
link
Review: “The Case Against Reality”
David Gross
Oct 29, 2024, 1:13 PM
20
points
9
comments
5
min read
LW
link
A Poem Is All You Need: Jailbreaking ChatGPT, Meta & More
Sharat Jacob Jacob
Oct 29, 2024, 12:41 PM
12
points
0
comments
9
min read
LW
link
Searching for phenomenal consciousness in LLMs: Perceptual reality monitoring and introspective confidence
EuanMcLean
Oct 29, 2024, 12:16 PM
45
points
9
comments
26
min read
LW
link
AI #87: Staying in Character
Zvi
Oct 29, 2024, 7:10 AM
57
points
3
comments
33
min read
LW
link
(thezvi.wordpress.com)
A path to human autonomy
Nathan Helm-Burger
Oct 29, 2024, 3:02 AM
53
points
16
comments
20
min read
LW
link
D&D.Sci Coliseum: Arena of Data Evaluation and Ruleset
aphyer
Oct 29, 2024, 1:21 AM
47
points
13
comments
6
min read
LW
link
Gwern: Why So Few Matt Levines?
kave
Oct 29, 2024, 1:07 AM
78
points
10
comments
1
min read
LW
link
(gwern.net)
October 2024 Progress in Guaranteed Safe AI
Quinn
Oct 28, 2024, 11:34 PM
7
points
0
comments
1
min read
LW
link
(gsai.substack.com)
5 homegrown EA projects, seeking small donors
Austin Chen
Oct 28, 2024, 11:24 PM
85
points
4
comments
LW
link
How might we solve the alignment problem? (Part 1: Intro, summary, ontology)
Joe Carlsmith
Oct 28, 2024, 9:57 PM
54
points
5
comments
32
min read
LW
link
Enhancing Mathematical Modeling with LLMs: Goals, Challenges, and Evaluations
ozziegooen
Oct 28, 2024, 9:44 PM
7
points
0
comments
LW
link
AI & wisdom 3: AI effects on amortised optimisation
L Rudolf L
Oct 28, 2024, 9:08 PM
18
points
0
comments
14
min read
LW
link
(rudolf.website)
AI & wisdom 2: growth and amortised optimisation
L Rudolf L
Oct 28, 2024, 9:07 PM
18
points
0
comments
8
min read
LW
link
(rudolf.website)
AI & wisdom 1: wisdom, amortised optimisation, and AI
L Rudolf L
Oct 28, 2024, 9:02 PM
29
points
0
comments
15
min read
LW
link
(rudolf.website)
Finishing The SB-1047 Documentary In 6 Weeks
Michaël Trazzi
Oct 28, 2024, 8:17 PM
94
points
7
comments
4
min read
LW
link
(manifund.org)
Towards the Operationalization of Philosophy & Wisdom
Thane Ruthenis
Oct 28, 2024, 7:45 PM
20
points
2
comments
33
min read
LW
link
(aiimpacts.org)
Quantitative Trading Bootcamp [Nov 6-10]
Ricki Heicklen
Oct 28, 2024, 6:39 PM
7
points
0
comments
1
min read
LW
link
Winners of the Essay competition on the Automation of Wisdom and Philosophy
owencb
and
AI Impacts
Oct 28, 2024, 5:10 PM
40
points
3
comments
30
min read
LW
link
(blog.aiimpacts.org)
Miles Brundage: Finding Ways to Credibly Signal the Benignness of AI Development and Deployment is an Urgent Priority
Zach Stein-Perlman
Oct 28, 2024, 5:00 PM
22
points
4
comments
3
min read
LW
link
(milesbrundage.substack.com)
[Question]
somebody explain the word “epistemic” to me
KvmanThinking
Oct 28, 2024, 4:40 PM
7
points
8
comments
1
min read
LW
link
~80 Interesting Questions about Foundation Model Agent Safety
RohanS
and
Govind Pimpale
Oct 28, 2024, 4:37 PM
46
points
4
comments
15
min read
LW
link
AI Safety Newsletter #43: White House Issues First National Security Memo on AI Plus, AI and Job Displacement, and AI Takes Over the Nobels
Corin Katzke
,
Corin Katzke
,
Alexa Pan
and
Dan H
Oct 28, 2024, 4:03 PM
6
points
0
comments
6
min read
LW
link
(newsletter.safe.ai)
Death notes − 7 thoughts on death
Nathan Young
Oct 28, 2024, 3:01 PM
26
points
1
comment
5
min read
LW
link
(nathanpmyoung.substack.com)
SAEs you can See: Applying Sparse Autoencoders to Clustering
Robert_AIZI
Oct 28, 2024, 2:48 PM
27
points
0
comments
10
min read
LW
link
Bridging the VLM and mech interp communities for multimodal interpretability
Sonia Joseph
Oct 28, 2024, 2:41 PM
19
points
5
comments
15
min read
LW
link
How Likely Are Various Precursors of Existential Risk?
NunoSempere
Oct 28, 2024, 1:27 PM
55
points
4
comments
15
min read
LW
link
(blog.sentinel-team.org)
Care Doesn’t Scale
stavros
Oct 28, 2024, 11:57 AM
27
points
1
comment
1
min read
LW
link
(stevenscrawls.com)
Your memory eventually drives confidence in each hypothesis to 1 or 0
Crazy philosopher
Oct 28, 2024, 9:00 AM
3
points
6
comments
1
min read
LW
link
Nerdtrition: simple diets via spreadsheet abuse
dkl9
Oct 27, 2024, 9:45 PM
8
points
0
comments
3
min read
LW
link
(dkl9.net)
AGI Fermi Paradox
jrincayc
Oct 27, 2024, 8:14 PM
0
points
2
comments
2
min read
LW
link
Substituting Talkbox for Breath Controller
jefftk
Oct 27, 2024, 7:10 PM
11
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Open Source Replication of Anthropic’s Crosscoder paper for model-diffing
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Oct 27, 2024, 6:46 PM
48
points
4
comments
5
min read
LW
link
Hiring a writer to co-author with me (Spencer Greenberg for ClearerThinking.org)
spencerg
Oct 27, 2024, 5:34 PM
16
points
0
comments
LW
link
Interview with Bill O’Rourke—Russian Corruption, Putin, Applied Ethics, and More
JohnGreer
Oct 27, 2024, 5:11 PM
3
points
0
comments
6
min read
LW
link
On Shifgrethor
JustisMills
Oct 27, 2024, 3:30 PM
67
points
18
comments
2
min read
LW
link
(justismills.substack.com)
The hostile telepaths problem
Valentine
Oct 27, 2024, 3:26 PM
383
points
89
comments
15
min read
LW
link
[Question]
What are some good ways to form opinions on controversial subjects in the current and upcoming era?
Terence Coelho
Oct 27, 2024, 2:33 PM
9
points
21
comments
1
min read
LW
link
Video lectures on the learning-theoretic agenda
Vanessa Kosoy
Oct 27, 2024, 12:01 PM
75
points
0
comments
1
min read
LW
link
(www.youtube.com)
Dario Amodei’s “Machines of Loving Grace” sound incredibly dangerous, for Humans
Super AGI
Oct 27, 2024, 5:05 AM
8
points
1
comment
1
min read
LW
link
Electrostatic Airships?
DaemonicSigil
Oct 27, 2024, 4:32 AM
64
points
13
comments
3
min read
LW
link
(pbement.com)
A suite of Vision Sparse Autoencoders
Louka Ewington-Pitsos
and
RRGoyal
Oct 27, 2024, 4:05 AM
25
points
0
comments
1
min read
LW
link
Ways to think about alignment
Abhimanyu Pallavi Sudhir
Oct 27, 2024, 1:40 AM
6
points
0
comments
4
min read
LW
link
[Question]
Is there a CFAR handbook audio option?
FinalFormal2
Oct 26, 2024, 5:08 PM
16
points
0
comments
1
min read
LW
link
Retrieval Augmented Genesis II — Holy Texts Semantics Analysis
João Ribeiro Medeiros
Oct 26, 2024, 5:00 PM
−1
points
0
comments
11
min read
LW
link
A superficially plausible promising alternate Earth without lockstep
Lorec
Oct 26, 2024, 4:04 PM
−2
points
3
comments
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel