Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
FarmKind’s Illusory Offer
jefftk
Aug 9, 2024, 11:30 AM
71
points
5
comments
3
min read
LW
link
(www.jefftk.com)
Please do not use AI to write for you
Richard_Kennaway
Aug 21, 2024, 9:53 AM
69
points
34
comments
4
min read
LW
link
What is it to solve the alignment problem? (Notes)
Joe Carlsmith
Aug 24, 2024, 9:19 PM
69
points
18
comments
53
min read
LW
link
The Hessian rank bounds the learning coefficient
Lucius Bushnaq
Aug 8, 2024, 8:55 PM
68
points
10
comments
4
min read
LW
link
Showing SAE Latents Are Not Atomic Using Meta-SAEs
Bart Bussmann
,
Michael Pearce
,
Patrick Leask
,
Joseph Bloom
,
Lee Sharkey
and
Neel Nanda
Aug 24, 2024, 12:56 AM
68
points
10
comments
20
min read
LW
link
GPT-4o System Card
Zach Stein-Perlman
Aug 8, 2024, 8:30 PM
68
points
11
comments
2
min read
LW
link
(openai.com)
AI #79: Ready for Some Football
Zvi
Aug 29, 2024, 1:30 PM
68
points
16
comments
32
min read
LW
link
(thezvi.wordpress.com)
Why Large Bureaucratic Organizations?
johnswentworth
Aug 27, 2024, 6:30 PM
68
points
52
comments
12
min read
LW
link
The economics of space tethers
harsimony
Aug 22, 2024, 4:15 PM
67
points
22
comments
7
min read
LW
link
(splittinginfinity.substack.com)
Fear of centralized power vs. fear of misaligned AGI: Vitalik Buterin on 80,000 Hours
Seth Herd
Aug 5, 2024, 3:38 PM
66
points
22
comments
5
min read
LW
link
A primer on why computational predictive toxicology is hard
Abhishaike Mahajan
Aug 19, 2024, 5:16 PM
63
points
2
comments
12
min read
LW
link
(www.owlposting.com)
Interdictor Ship
lsusr
Aug 19, 2024, 4:59 AM
63
points
9
comments
7
min read
LW
link
Outrage Bonding
Jonathan Moregård
Aug 9, 2024, 1:46 PM
63
points
12
comments
2
min read
LW
link
(honestliving.substack.com)
Rationalists are missing a core piece for agent-like structure (energy vs information overload)
tailcalled
Aug 17, 2024, 9:57 AM
62
points
9
comments
4
min read
LW
link
AI #78: Some Welcome Calm
Zvi
Aug 22, 2024, 2:20 PM
61
points
15
comments
33
min read
LW
link
(thezvi.wordpress.com)
Self-explaining SAE features
Dmitrii Kharlapenko
,
neverix
,
Neel Nanda
and
Arthur Conmy
Aug 5, 2024, 10:20 PM
60
points
13
comments
10
min read
LW
link
… Wait, our models of semantics should inform fluid mechanics?!?
johnswentworth
and
David Lorell
Aug 26, 2024, 4:38 PM
59
points
18
comments
4
min read
LW
link
Announcing the $200k EA Community Choice
Austin Chen
Aug 14, 2024, 12:39 AM
58
points
8
comments
LW
link
(manifund.substack.com)
Congressional Insider Trading
Maxwell Tabarrok
Aug 30, 2024, 1:32 PM
57
points
6
comments
7
min read
LW
link
(www.maximum-progress.com)
You’re a Space Wizard, Luke
lsusr
Aug 18, 2024, 5:35 AM
57
points
6
comments
2
min read
LW
link
Referendum Mechanics in a Marketplace of Ideas
Martin Sustrik
Aug 25, 2024, 8:30 AM
57
points
2
comments
5
min read
LW
link
(250bpm.substack.com)
The Bitter Lesson for AI Safety Research
adamk
,
Richard Ren
,
Dan H
and
Gabe M
Aug 2, 2024, 6:39 PM
57
points
5
comments
3
min read
LW
link
Some Unorthodox Ways To Achieve High GDP Growth
johnswentworth
and
David Lorell
Aug 8, 2024, 6:58 PM
57
points
6
comments
6
min read
LW
link
John Schulman leaves OpenAI for Anthropic [and then left Anthropic again for Thinking Machines]
Sodium
Aug 6, 2024, 1:23 AM
57
points
0
comments
1
min read
LW
link
Measuring Structure Development in Algorithmic Transformers
Micurie
and
Einar Urdshals
Aug 22, 2024, 8:38 AM
56
points
4
comments
11
min read
LW
link
Thiel on AI & Racing with China
Ben Pace
Aug 20, 2024, 3:19 AM
55
points
10
comments
12
min read
LW
link
Demis Hassabis — Google DeepMind: The Podcast
Zach Stein-Perlman
Aug 16, 2024, 12:00 AM
55
points
8
comments
3
min read
LW
link
(www.youtube.com)
Owain Evans on Situational Awareness and Out-of-Context Reasoning in LLMs
Michaël Trazzi
Aug 24, 2024, 4:30 AM
55
points
0
comments
5
min read
LW
link
[LDSL#0] Some epistemological conundrums
tailcalled
Aug 7, 2024, 7:52 PM
54
points
11
comments
10
min read
LW
link
Provably Safe AI: Worldview and Projects
Ben Goldhaber
and
Steve_Omohundro
Aug 9, 2024, 11:21 PM
54
points
44
comments
7
min read
LW
link
Calendar feature geometry in GPT-2 layer 8 residual stream SAEs
Patrick Leask
,
Bart Bussmann
and
Neel Nanda
Aug 17, 2024, 1:16 AM
53
points
0
comments
5
min read
LW
link
Extended Interview with Zhukeepa on Religion
Ben Pace
and
zhukeepa
Aug 18, 2024, 3:19 AM
53
points
61
comments
119
min read
LW
link
AI Rights for Human Safety
Simon Goldstein
Aug 1, 2024, 11:01 PM
53
points
6
comments
1
min read
LW
link
(papers.ssrn.com)
AI #76: Six Shorts Stories About OpenAI
Zvi
Aug 8, 2024, 1:50 PM
53
points
10
comments
48
min read
LW
link
(thezvi.wordpress.com)
Rewilding the Gut VS the Autoimmune Epidemic
GGD
Aug 16, 2024, 6:00 PM
51
points
0
comments
3
min read
LW
link
Decision Theory in Space
lsusr
Aug 18, 2024, 7:02 AM
50
points
18
comments
2
min read
LW
link
Interoperable High Level Structures: Early Thoughts on Adjectives
johnswentworth
and
David Lorell
Aug 22, 2024, 9:12 PM
49
points
1
comment
7
min read
LW
link
SRE’s review of Democracy
Martin Sustrik
Aug 3, 2024, 7:20 AM
48
points
2
comments
3
min read
LW
link
(250bpm.substack.com)
What’s important in “AI for epistemics”?
Lukas Finnveden
Aug 24, 2024, 1:27 AM
48
points
0
comments
28
min read
LW
link
(www.forethought.org)
Trustworthy and untrustworthy models
Olli Järviniemi
Aug 19, 2024, 4:27 PM
47
points
3
comments
8
min read
LW
link
All The Latest Human tFUS Studies
sarahconstantin
Aug 9, 2024, 10:20 PM
46
points
2
comments
8
min read
LW
link
(sarahconstantin.substack.com)
Humanity isn’t remotely longtermist, so arguments for AGI x-risk should focus on the near term
Seth Herd
12 Aug 2024 18:10 UTC
46
points
10
comments
1
min read
LW
link
We’re not as 3-Dimensional as We Think
silentbob
4 Aug 2024 14:39 UTC
46
points
17
comments
5
min read
LW
link
How to hire somebody better than yourself
lemonhope
28 Aug 2024 8:12 UTC
46
points
5
comments
5
min read
LW
link
AI #75: Math is Easier
Zvi
1 Aug 2024 13:40 UTC
46
points
25
comments
72
min read
LW
link
(thezvi.wordpress.com)
Principled Satisficing To Avoid Goodhart
JenniferRM
16 Aug 2024 19:05 UTC
45
points
2
comments
8
min read
LW
link
Startup Roundup #2
Zvi
6 Aug 2024 13:30 UTC
45
points
0
comments
32
min read
LW
link
(thezvi.wordpress.com)
Case Study: Interpreting, Manipulating, and Controlling CLIP With Sparse Autoencoders
Gytis Daujotas
1 Aug 2024 21:08 UTC
45
points
7
comments
7
min read
LW
link
[Question]
“Deception Genre” What Books are like Project Lawful?
Double
28 Aug 2024 17:19 UTC
45
points
20
comments
1
min read
LW
link
In defense of technological unemployment as the main AI concern
tailcalled
27 Aug 2024 17:58 UTC
44
points
36
comments
1
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel