Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Anthropic announces interpretability advances. How much does this advance alignment?
Seth Herd
May 21, 2024, 10:30 PM
49
points
4
comments
3
min read
LW
link
(www.anthropic.com)
[Question]
What would stop you from paying for an LLM?
yanni kyriacos
May 21, 2024, 10:25 PM
17
points
15
comments
1
min read
LW
link
EIS XIII: Reflections on Anthropic’s SAE Research Circa May 2024
scasper
May 21, 2024, 8:15 PM
157
points
16
comments
3
min read
LW
link
Mitigating extreme AI risks amid rapid progress [Linkpost]
Orpheus16
May 21, 2024, 7:59 PM
21
points
7
comments
4
min read
LW
link
The problem with rationality
David Loomis
May 21, 2024, 6:49 PM
−17
points
1
comment
6
min read
LW
link
rough draft on what happens in the brain when you have an insight
Emrik
May 21, 2024, 6:02 PM
11
points
2
comments
1
min read
LW
link
On Dwarkesh’s Podcast with OpenAI’s John Schulman
Zvi
May 21, 2024, 5:30 PM
73
points
4
comments
20
min read
LW
link
(thezvi.wordpress.com)
[Question]
Is deleting capabilities still a relevant research question?
tailcalled
May 21, 2024, 1:24 PM
15
points
1
comment
1
min read
LW
link
New voluntary commitments (AI Seoul Summit)
Zach Stein-Perlman
May 21, 2024, 11:00 AM
81
points
17
comments
7
min read
LW
link
(www.gov.uk)
ACX/LW/EA/* Meetup Bremen
RasmusHB
May 21, 2024, 5:42 AM
2
points
0
comments
1
min read
LW
link
My Dating Heuristic
Declan Molony
May 21, 2024, 5:28 AM
26
points
4
comments
2
min read
LW
link
Scorable Functions: A Format for Algorithmic Forecasting
ozziegooen
May 21, 2024, 4:14 AM
29
points
0
comments
LW
link
The Problem With the Word ‘Alignment’
peligrietzer
and
particlemania
May 21, 2024, 3:48 AM
63
points
8
comments
6
min read
LW
link
What’s Going on With OpenAI’s Messaging?
ozziegooen
May 21, 2024, 2:22 AM
191
points
13
comments
LW
link
Harmony Intelligence is Hiring!
James Dao
and
Soroush Pour
May 21, 2024, 2:11 AM
10
points
0
comments
1
min read
LW
link
(www.harmonyintelligence.com)
[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.
Linch
May 20, 2024, 11:50 PM
31
points
8
comments
1
min read
LW
link
(variety.com)
Some perspectives on the discipline of Physics
Tahp
May 20, 2024, 6:19 PM
17
points
3
comments
13
min read
LW
link
(quark.rodeo)
[Question]
Are there any groupchats for people working on Representation reading/control, activation steering type experiments?
Joe Kwon
May 20, 2024, 6:03 PM
3
points
1
comment
1
min read
LW
link
Interpretability: Integrated Gradients is a decent attribution method
Lucius Bushnaq
,
jake_mendel
,
StefanHex
and
Kaarel
May 20, 2024, 5:55 PM
23
points
7
comments
6
min read
LW
link
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq
,
jake_mendel
,
Dan Braun
,
StefanHex
,
Nicholas Goldowsky-Dill
,
Kaarel
,
Avery
,
Joern Stoehler
,
debrevitatevitae
,
Magdalena Wache
and
Marius Hobbhahn
May 20, 2024, 5:53 PM
107
points
4
comments
3
min read
LW
link
NAO Updates, Spring 2024
jefftk
May 20, 2024, 4:51 PM
13
points
0
comments
6
min read
LW
link
(naobservatory.org)
OpenAI: Exodus
Zvi
May 20, 2024, 1:10 PM
153
points
26
comments
44
min read
LW
link
(thezvi.wordpress.com)
Infra-Bayesian haggling
hannagabor
May 20, 2024, 12:23 PM
28
points
0
comments
20
min read
LW
link
Jaan Tallinn’s 2023 Philanthropy Overview
jaan
May 20, 2024, 12:11 PM
203
points
5
comments
1
min read
LW
link
(jaan.info)
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic
May 20, 2024, 9:38 AM
31
points
2
comments
1
min read
LW
link
Why I find Davidad’s plan interesting
Paul W
May 20, 2024, 8:13 AM
18
points
0
comments
6
min read
LW
link
Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds
May 20, 2024, 4:14 AM
30
points
21
comments
10
min read
LW
link
(www.anthropic.com)
The consistent guessing problem is easier than the halting problem
jessicata
May 20, 2024, 4:02 AM
38
points
5
comments
4
min read
LW
link
(unstableontology.com)
A poem titled ‘Tick Tock’.
Krantz
May 20, 2024, 3:52 AM
−1
points
0
comments
1
min read
LW
link
Against Computers (infinite play)
rogersbacon
May 20, 2024, 12:43 AM
−11
points
1
comment
14
min read
LW
link
(www.secretorum.life)
Testing for parallel reasoning in LLMs
meemi
and
Olli Järviniemi
May 19, 2024, 3:28 PM
9
points
7
comments
9
min read
LW
link
Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)
O O
May 19, 2024, 2:18 AM
14
points
15
comments
2
min read
LW
link
Some “meta-cruxes” for AI x-risk debates
Aryeh Englander
May 19, 2024, 12:21 AM
20
points
2
comments
3
min read
LW
link
On Privilege
Shmi
May 18, 2024, 10:36 PM
15
points
10
comments
2
min read
LW
link
Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University
Johannes C. Mayer
May 18, 2024, 7:53 PM
22
points
37
comments
6
min read
LW
link
To Limit Impact, Limit KL-Divergence
J Bostock
May 18, 2024, 6:52 PM
10
points
1
comment
5
min read
LW
link
[Question]
Are There Other Ideas as Generally Applicable as Natural Selection
Amin Sennour
May 18, 2024, 4:37 PM
1
point
1
comment
1
min read
LW
link
Scientific Notation Options
jefftk
May 18, 2024, 3:10 PM
27
points
13
comments
1
min read
LW
link
(www.jefftk.com)
“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”
plex
May 18, 2024, 2:09 PM
54
points
23
comments
2
min read
LW
link
(aisafety.info)
What Are Non-Zero-Sum Games?—A Primer
James Stephen Brown
May 18, 2024, 9:19 AM
4
points
7
comments
3
min read
LW
link
DeepMind’s “Frontier Safety Framework” is weak and unambitious
Zach Stein-Perlman
May 18, 2024, 3:00 AM
159
points
14
comments
4
min read
LW
link
International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander
18 May 2024 1:45 UTC
39
points
0
comments
13
min read
LW
link
Goodhart in RL with KL: Appendix
Thomas Kwa
18 May 2024 0:40 UTC
12
points
0
comments
6
min read
LW
link
AI 2030 – AI Policy Roadmap
LTM
17 May 2024 23:29 UTC
8
points
0
comments
1
min read
LW
link
MIT FutureTech are hiring for an Operations and Project Management role.
peterslattery
17 May 2024 23:21 UTC
2
points
0
comments
3
min read
LW
link
Language Models Model Us
eggsyntax
17 May 2024 21:00 UTC
158
points
55
comments
7
min read
LW
link
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse
17 May 2024 19:13 UTC
67
points
10
comments
2
min read
LW
link
Agency
A*
17 May 2024 19:11 UTC
8
points
0
comments
1
min read
LW
link
DeepMind: Frontier Safety Framework
Zach Stein-Perlman
17 May 2024 17:30 UTC
64
points
0
comments
3
min read
LW
link
(deepmind.google)
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun
,
Jordan Taylor
,
Nicholas Goldowsky-Dill
and
Lee Sharkey
17 May 2024 16:25 UTC
57
points
20
comments
4
min read
LW
link
(arxiv.org)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel