Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
The Problem With the Word ‘Alignment’
peligrietzer
and
particlemania
May 21, 2024, 3:48 AM
63
points
8
comments
6
min read
LW
link
What’s Going on With OpenAI’s Messaging?
ozziegooen
May 21, 2024, 2:22 AM
191
points
13
comments
3
min read
LW
link
Harmony Intelligence is Hiring!
James Dao
and
Soroush Pour
May 21, 2024, 2:11 AM
10
points
0
comments
1
min read
LW
link
(www.harmonyintelligence.com)
[Linkpost] Statement from Scarlett Johansson on OpenAI’s use of the “Sky” voice, that was shockingly similar to her own voice.
Linch
May 20, 2024, 11:50 PM
31
points
8
comments
1
min read
LW
link
(variety.com)
Some perspectives on the discipline of Physics
Tahp
May 20, 2024, 6:19 PM
17
points
3
comments
13
min read
LW
link
(quark.rodeo)
[Question]
Are there any groupchats for people working on Representation reading/control, activation steering type experiments?
Joe Kwon
May 20, 2024, 6:03 PM
3
points
1
comment
1
min read
LW
link
Interpretability: Integrated Gradients is a decent attribution method
Lucius Bushnaq
,
jake_mendel
,
StefanHex
and
Kaarel
May 20, 2024, 5:55 PM
23
points
7
comments
6
min read
LW
link
The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks
Lucius Bushnaq
,
jake_mendel
,
Dan Braun
,
StefanHex
,
Nicholas Goldowsky-Dill
,
Kaarel
,
Avery
,
Joern Stoehler
,
debrevitatevitae
,
Magdalena Wache
and
Marius Hobbhahn
May 20, 2024, 5:53 PM
108
points
4
comments
3
min read
LW
link
NAO Updates, Spring 2024
jefftk
May 20, 2024, 4:51 PM
13
points
0
comments
6
min read
LW
link
(naobservatory.org)
OpenAI: Exodus
Zvi
May 20, 2024, 1:10 PM
153
points
26
comments
44
min read
LW
link
(thezvi.wordpress.com)
Infra-Bayesian haggling
hannagabor
May 20, 2024, 12:23 PM
28
points
0
comments
20
min read
LW
link
Jaan Tallinn’s 2023 Philanthropy Overview
jaan
May 20, 2024, 12:11 PM
203
points
5
comments
1
min read
LW
link
(jaan.info)
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures [Evaluation and Ruleset]
abstractapplic
May 20, 2024, 9:38 AM
31
points
2
comments
1
min read
LW
link
Why I find Davidad’s plan interesting
Paul W
May 20, 2024, 8:13 AM
18
points
0
comments
6
min read
LW
link
Anthropic: Reflections on our Responsible Scaling Policy
Zac Hatfield-Dodds
May 20, 2024, 4:14 AM
30
points
21
comments
10
min read
LW
link
(www.anthropic.com)
The consistent guessing problem is easier than the halting problem
jessicata
May 20, 2024, 4:02 AM
38
points
5
comments
4
min read
LW
link
(unstableontology.com)
A poem titled ‘Tick Tock’.
Krantz
May 20, 2024, 3:52 AM
−1
points
0
comments
1
min read
LW
link
Against Computers (infinite play)
rogersbacon
May 20, 2024, 12:43 AM
−11
points
1
comment
14
min read
LW
link
(www.secretorum.life)
Testing for parallel reasoning in LLMs
meemi
and
Olli Järviniemi
May 19, 2024, 3:28 PM
9
points
7
comments
9
min read
LW
link
Hot take: The AI safety movement is way too sectarian and this is greatly increasing p(doom)
O O
May 19, 2024, 2:18 AM
14
points
15
comments
2
min read
LW
link
Some “meta-cruxes” for AI x-risk debates
Aryeh Englander
May 19, 2024, 12:21 AM
20
points
2
comments
3
min read
LW
link
On Privilege
Shmi
May 18, 2024, 10:36 PM
15
points
10
comments
2
min read
LW
link
Fund me please—I Work so Hard that my Feet start Bleeding and I Need to Infiltrate University
Johannes C. Mayer
May 18, 2024, 7:53 PM
22
points
37
comments
6
min read
LW
link
To Limit Impact, Limit KL-Divergence
J Bostock
May 18, 2024, 6:52 PM
10
points
1
comment
5
min read
LW
link
[Question]
Are There Other Ideas as Generally Applicable as Natural Selection
Amin Sennour
May 18, 2024, 4:37 PM
1
point
1
comment
1
min read
LW
link
Scientific Notation Options
jefftk
May 18, 2024, 3:10 PM
27
points
13
comments
1
min read
LW
link
(www.jefftk.com)
“If we go extinct due to misaligned AI, at least nature will continue, right? … right?”
plex
May 18, 2024, 2:09 PM
54
points
23
comments
2
min read
LW
link
(aisafety.info)
What Are Non-Zero-Sum Games?—A Primer
James Stephen Brown
May 18, 2024, 9:19 AM
4
points
7
comments
3
min read
LW
link
DeepMind’s “Frontier Safety Framework” is weak and unambitious
Zach Stein-Perlman
May 18, 2024, 3:00 AM
159
points
14
comments
4
min read
LW
link
International Scientific Report on the Safety of Advanced AI: Key Information
Aryeh Englander
May 18, 2024, 1:45 AM
39
points
0
comments
13
min read
LW
link
Goodhart in RL with KL: Appendix
Thomas Kwa
May 18, 2024, 12:40 AM
12
points
0
comments
6
min read
LW
link
AI 2030 – AI Policy Roadmap
LTM
May 17, 2024, 11:29 PM
8
points
0
comments
1
min read
LW
link
MIT FutureTech are hiring for an Operations and Project Management role.
peterslattery
May 17, 2024, 11:21 PM
2
points
0
comments
3
min read
LW
link
Language Models Model Us
eggsyntax
May 17, 2024, 9:00 PM
159
points
55
comments
7
min read
LW
link
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Joar Skalse
May 17, 2024, 7:13 PM
67
points
10
comments
2
min read
LW
link
Agency
A*
May 17, 2024, 7:11 PM
8
points
0
comments
1
min read
LW
link
DeepMind: Frontier Safety Framework
Zach Stein-Perlman
May 17, 2024, 5:30 PM
64
points
0
comments
3
min read
LW
link
(deepmind.google)
Identifying Functionally Important Features with End-to-End Sparse Dictionary Learning
Dan Braun
,
Jordan Taylor
,
Nicholas Goldowsky-Dill
and
Lee Sharkey
May 17, 2024, 4:25 PM
57
points
20
comments
4
min read
LW
link
(arxiv.org)
AISafety.com – Resources for AI Safety
Søren Elverlin
,
plex
,
Bryce Robertson
and
Melissa Samworth
May 17, 2024, 3:57 PM
83
points
3
comments
1
min read
LW
link
Is There Really a Child Penalty in the Long Run?
Maxwell Tabarrok
May 17, 2024, 11:56 AM
23
points
6
comments
5
min read
LW
link
(www.maximum-progress.com)
My Hammer Time Final Exam
adios
May 17, 2024, 9:28 AM
10
points
3
comments
3
min read
LW
link
[Question]
Is there a place to find the most cited LW articles of all time?
keltan
May 17, 2024, 1:20 AM
4
points
3
comments
1
min read
LW
link
D&D.Sci (Easy Mode): On The Construction Of Impossible Structures
abstractapplic
May 17, 2024, 12:25 AM
34
points
12
comments
2
min read
LW
link
To an LLM, everything looks like a logic puzzle
Jesse Richardson
May 16, 2024, 10:21 PM
14
points
2
comments
2
min read
LW
link
AI Safety Institute’s Inspect hello world example for AI evals
TheManxLoiner
May 16, 2024, 8:47 PM
3
points
0
comments
1
min read
LW
link
(lovkush.medium.com)
Feeling (instrumentally) Rational
Morphism
May 16, 2024, 6:56 PM
14
points
5
comments
1
min read
LW
link
Advice for Activists from the History of Environmentalism
Jeffrey Heninger
May 16, 2024, 6:40 PM
100
points
8
comments
6
min read
LW
link
(blog.aiimpacts.org)
Ninety-five theses on AI
hamandcheese
May 16, 2024, 5:51 PM
21
points
0
comments
7
min read
LW
link
GPT-4o My and Google I/O Day
Zvi
May 16, 2024, 5:50 PM
41
points
2
comments
37
min read
LW
link
(thezvi.wordpress.com)
AI #64: Feel the Mundane Utility
Zvi
May 16, 2024, 3:20 PM
28
points
11
comments
47
min read
LW
link
(thezvi.wordpress.com)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel