Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
Being hella lost as rationality practice
Rachel Shu
Jun 24, 2024, 11:50 PM
14
points
0
comments
2
min read
LW
link
A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson
Jun 24, 2024, 8:26 PM
24
points
3
comments
7
min read
LW
link
The Minority Coalition
Richard_Ngo
Jun 24, 2024, 8:01 PM
103
points
9
comments
5
min read
LW
link
(www.narrativeark.xyz)
Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC
,
rajashree
,
Adrià Garriga-alonso
and
Jason Gross
Jun 24, 2024, 7:27 PM
97
points
4
comments
8
min read
LW
link
(arxiv.org)
Contrapositive Natural Abstraction—Project Intro
Elliot Callender
Jun 24, 2024, 6:37 PM
4
points
5
comments
2
min read
LW
link
Sparse Features Through Time
Rogan Inglis
Jun 24, 2024, 6:06 PM
12
points
1
comment
1
min read
LW
link
(roganinglis.io)
PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment
rpglover64
Jun 24, 2024, 5:53 PM
18
points
1
comment
3
min read
LW
link
Paying Russians to not invade Ukraine
djColliderBias
Jun 24, 2024, 5:46 PM
9
points
7
comments
3
min read
LW
link
SAE feature geometry is outside the superposition hypothesis
jake_mendel
Jun 24, 2024, 4:07 PM
228
points
17
comments
11
min read
LW
link
So you want to work on technical AI safety
gw
Jun 24, 2024, 2:29 PM
51
points
3
comments
14
min read
LW
link
The Future of Work: How Can Policymakers Prepare for AI’s Impact on Labor Markets?
davidconrad
,
Arturs
and
Tillman Schenk
Jun 24, 2024, 2:18 PM
5
points
0
comments
3
min read
LW
link
LLM Generality is a Timeline Crux
eggsyntax
Jun 24, 2024, 12:52 PM
218
points
119
comments
7
min read
LW
link
On Claude 3.5 Sonnet
Zvi
Jun 24, 2024, 12:00 PM
95
points
14
comments
13
min read
LW
link
(thezvi.wordpress.com)
Book Review: Righteous Victims—A History of the Zionist-Arab Conflict
Yair Halberstadt
Jun 24, 2024, 11:02 AM
53
points
8
comments
34
min read
LW
link
The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit
Jun 24, 2024, 10:05 AM
24
points
0
comments
4
min read
LW
link
(www.nature.com)
Sci-Fi books micro-reviews
Yair Halberstadt
Jun 24, 2024, 9:49 AM
44
points
27
comments
4
min read
LW
link
A Step Against Land Value Tax
Blog Alt
Jun 24, 2024, 5:13 AM
9
points
23
comments
6
min read
LW
link
(antematters.substack.com)
Different senses in which two AIs can be “the same”
Vivek Hebbar
and
Buck
Jun 24, 2024, 3:16 AM
69
points
2
comments
4
min read
LW
link
Talk: AI safety fieldbuilding at MATS
Ryan Kidd
Jun 23, 2024, 11:06 PM
26
points
2
comments
10
min read
LW
link
AI Labs Wouldn’t be Convicted of Treason or Sedition
Matthew Khoriaty
Jun 23, 2024, 9:34 PM
9
points
2
comments
3
min read
LW
link
Control Vectors as Dispositional Traits
Gianluca Calcagni
Jun 23, 2024, 9:34 PM
10
points
0
comments
11
min read
LW
link
“On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Claude 2024 [humor]
gwern
Jun 23, 2024, 9:18 PM
22
points
6
comments
1
min read
LW
link
(gwern.net)
[Question]
How are you preparing for the possibility of an AI bust?
Nate Showell
Jun 23, 2024, 7:13 PM
26
points
16
comments
1
min read
LW
link
A simple text status can change something
nextcaller
Jun 23, 2024, 6:48 PM
5
points
0
comments
2
min read
LW
link
35 Interactive Learning Modules Relevant to EAs / Effective Altruism (that are all free)
spencerg
Jun 23, 2024, 5:57 PM
5
points
0
comments
LW
link
Podcasts: AGI Show, Consistently Candid, London Futurists
KatjaGrace
Jun 23, 2024, 1:50 PM
16
points
0
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
Text Posts from the Kids Group: 2019
jefftk
Jun 23, 2024, 1:20 PM
23
points
0
comments
18
min read
LW
link
(www.jefftk.com)
Population ethics and the value of variety
cousin_it
Jun 23, 2024, 10:42 AM
24
points
11
comments
2
min read
LW
link
[Question]
Karma votes: blind to or accounting for score?
cata
Jun 22, 2024, 9:40 PM
19
points
4
comments
1
min read
LW
link
[Question]
Should effective altruism be more “cool”?
jaredmantell
Jun 22, 2024, 8:42 PM
3
points
3
comments
1
min read
LW
link
Meta Alignment: Communication Wack-a-Mole
Bridgett Kay
Jun 22, 2024, 8:12 PM
16
points
2
comments
5
min read
LW
link
(dxmrevealed.wordpress.com)
AI as a computing platform: what to expect
Jonasb
Jun 22, 2024, 7:55 PM
−3
points
0
comments
7
min read
LW
link
(www.denominations.io)
Expected number of tries
adios
Jun 22, 2024, 7:22 PM
6
points
0
comments
2
min read
LW
link
Applying Force to the Wrong End of a Causal Chain
silentbob
Jun 22, 2024, 6:06 PM
41
points
0
comments
9
min read
LW
link
Bed Time Quests & Dinner Games for 3-5 year olds
Gunnar_Zarncke
and
Shoshannah Tekofsky
Jun 22, 2024, 7:53 AM
51
points
0
comments
1
min read
LW
link
(kidquest.substack.com)
Appraising aggregativism and utilitarianism
Cleo Nardo
Jun 21, 2024, 11:10 PM
27
points
10
comments
19
min read
LW
link
Best-of-n with misaligned reward models for Math reasoning
Fabien Roger
Jun 21, 2024, 10:53 PM
25
points
0
comments
3
min read
LW
link
No really, the Sticker Shortcut fallacy is indeed a fallacy
ymeskhout
Jun 21, 2024, 10:27 PM
11
points
2
comments
5
min read
LW
link
(www.ymeskhout.com)
Sarajevo 1914: Black Swan Questions
SebastianG
Jun 21, 2024, 9:27 PM
8
points
0
comments
2
min read
LW
link
Yudkowsky is too optimistic about how AI will treat humans.
ProfessorFalken
Jun 21, 2024, 7:01 PM
0
points
1
comment
1
min read
LW
link
Juneberry Puffs
jefftk
Jun 21, 2024, 6:50 PM
15
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Let’s Design a School, Part 3.2 Costs
Sable
Jun 21, 2024, 5:58 PM
8
points
0
comments
5
min read
LW
link
(affablyevil.substack.com)
2022 AI Alignment Course: 5→37% working on AI safety
Dewi
Jun 21, 2024, 5:45 PM
7
points
3
comments
3
min read
LW
link
Some Thoughts on AI Alignment: Using AI to Control AI
eigenvalue
Jun 21, 2024, 5:44 PM
1
point
1
comment
1
min read
LW
link
(github.com)
What distinguishes “early”, “mid” and “end” games?
Raemon
Jun 21, 2024, 5:41 PM
48
points
22
comments
1
min read
LW
link
Nuclear War, Map and Territory, Values | Guild of the Rose Newsletter, May 2024
moridinamael
Jun 21, 2024, 5:39 PM
18
points
0
comments
4
min read
LW
link
(guildoftherose.org)
AI governance needs a theory of victory
Corin Katzke
and
Justin Bullock
Jun 21, 2024, 4:15 PM
45
points
8
comments
LW
link
(www.convergenceanalysis.org)
Connecting the Dots: LLMs can Infer & Verbalize Latent Structure from Training Data
Johannes Treutlein
and
Owain_Evans
Jun 21, 2024, 3:54 PM
163
points
13
comments
8
min read
LW
link
(arxiv.org)
On OpenAI’s Model Spec
Zvi
Jun 21, 2024, 1:00 PM
47
points
4
comments
30
min read
LW
link
(thezvi.wordpress.com)
Attention Output SAEs Improve Circuit Analysis
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jun 21, 2024, 12:56 PM
33
points
3
comments
19
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel