Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
In favour of exploring nagging doubts about x-risk
owencb
Jun 25, 2024, 11:52 PM
105
points
2
comments
LW
link
What is a Tool?
johnswentworth
and
David Lorell
Jun 25, 2024, 11:40 PM
62
points
4
comments
6
min read
LW
link
[Question]
When do alignment researchers retire?
Jordan Taylor
Jun 25, 2024, 11:30 PM
4
points
2
comments
1
min read
LW
link
Compute Governance Literature Review
sijarvis
Jun 25, 2024, 10:41 PM
11
points
0
comments
13
min read
LW
link
Computational Complexity as an Intuition Pump for LLM Generality
aribrill
Jun 25, 2024, 8:25 PM
18
points
6
comments
3
min read
LW
link
Failure Modes of Teaching AI Safety
Eleni Angelou
Jun 25, 2024, 7:07 PM
20
points
0
comments
1
min read
LW
link
Kingfisher Summer Tour 2024
jefftk
Jun 25, 2024, 6:50 PM
9
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes
Jun 25, 2024, 5:49 PM
30
points
1
comment
28
min read
LW
link
An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen
Jun 25, 2024, 3:57 PM
27
points
0
comments
9
min read
LW
link
(adamkarvonen.github.io)
Formal verification, heuristic explanations and surprise accounting
Jacob_Hilton
Jun 25, 2024, 3:40 PM
156
points
11
comments
9
min read
LW
link
(www.alignment.org)
Metastrategy get-started guide
Tahp
Jun 25, 2024, 3:04 PM
6
points
1
comment
8
min read
LW
link
Labor Participation is an Alignment Risk
alex
Jun 25, 2024, 2:15 PM
−5
points
2
comments
17
min read
LW
link
Monthly Roundup #19: June 2024
Zvi
Jun 25, 2024, 12:00 PM
28
points
9
comments
54
min read
LW
link
(thezvi.wordpress.com)
Regularly meta-optimization
Crazy philosopher
Jun 25, 2024, 6:12 AM
−4
points
6
comments
1
min read
LW
link
Memetics as an analogy and its implicit connotations
Rachel Shu
Jun 25, 2024, 5:13 AM
3
points
0
comments
3
min read
LW
link
Mistakes people make when thinking about units
Isaac King
Jun 25, 2024, 3:39 AM
74
points
14
comments
7
min read
LW
link
Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu
Jun 25, 2024, 1:35 AM
46
points
9
comments
3
min read
LW
link
I’m a bit skeptical of AlphaFold 3
Oleg Trott
Jun 25, 2024, 12:04 AM
87
points
14
comments
2
min read
LW
link
Being hella lost as rationality practice
Rachel Shu
Jun 24, 2024, 11:50 PM
14
points
0
comments
2
min read
LW
link
A Basic Economics-Style Model of AI Existential Risk
Rubi J. Hudson
Jun 24, 2024, 8:26 PM
24
points
3
comments
7
min read
LW
link
The Minority Coalition
Richard_Ngo
Jun 24, 2024, 8:01 PM
103
points
9
comments
5
min read
LW
link
(www.narrativeark.xyz)
Compact Proofs of Model Performance via Mechanistic Interpretability
LawrenceC
,
rajashree
,
Adrià Garriga-alonso
and
Jason Gross
Jun 24, 2024, 7:27 PM
97
points
4
comments
8
min read
LW
link
(arxiv.org)
Contrapositive Natural Abstraction—Project Intro
Elliot Callender
Jun 24, 2024, 6:37 PM
4
points
5
comments
2
min read
LW
link
Sparse Features Through Time
Rogan Inglis
Jun 24, 2024, 6:06 PM
12
points
1
comment
1
min read
LW
link
(roganinglis.io)
PSA: Consider alternatives to AUROC when reporting classifier metrics for alignment
rpglover64
Jun 24, 2024, 5:53 PM
18
points
1
comment
3
min read
LW
link
Paying Russians to not invade Ukraine
djColliderBias
Jun 24, 2024, 5:46 PM
9
points
7
comments
3
min read
LW
link
SAE feature geometry is outside the superposition hypothesis
jake_mendel
Jun 24, 2024, 4:07 PM
228
points
17
comments
11
min read
LW
link
So you want to work on technical AI safety
gw
Jun 24, 2024, 2:29 PM
51
points
3
comments
14
min read
LW
link
The Future of Work: How Can Policymakers Prepare for AI’s Impact on Labor Markets?
davidconrad
,
Arturs
and
Tillman Schenk
Jun 24, 2024, 2:18 PM
5
points
0
comments
3
min read
LW
link
LLM Generality is a Timeline Crux
eggsyntax
Jun 24, 2024, 12:52 PM
218
points
119
comments
7
min read
LW
link
On Claude 3.5 Sonnet
Zvi
Jun 24, 2024, 12:00 PM
95
points
14
comments
13
min read
LW
link
(thezvi.wordpress.com)
Book Review: Righteous Victims—A History of the Zionist-Arab Conflict
Yair Halberstadt
Jun 24, 2024, 11:02 AM
53
points
8
comments
34
min read
LW
link
The Living Planet Index: A Case Study in Statistical Pitfalls
Jan_Kulveit
Jun 24, 2024, 10:05 AM
24
points
0
comments
4
min read
LW
link
(www.nature.com)
Sci-Fi books micro-reviews
Yair Halberstadt
Jun 24, 2024, 9:49 AM
44
points
27
comments
4
min read
LW
link
A Step Against Land Value Tax
Blog Alt
Jun 24, 2024, 5:13 AM
9
points
23
comments
6
min read
LW
link
(antematters.substack.com)
Different senses in which two AIs can be “the same”
Vivek Hebbar
and
Buck
Jun 24, 2024, 3:16 AM
69
points
2
comments
4
min read
LW
link
Talk: AI safety fieldbuilding at MATS
Ryan Kidd
Jun 23, 2024, 11:06 PM
26
points
2
comments
10
min read
LW
link
AI Labs Wouldn’t be Convicted of Treason or Sedition
Matthew Khoriaty
Jun 23, 2024, 9:34 PM
9
points
2
comments
3
min read
LW
link
Control Vectors as Dispositional Traits
Gianluca Calcagni
Jun 23, 2024, 9:34 PM
10
points
0
comments
11
min read
LW
link
“On the Impossibility of Superintelligent Rubik’s Cube Solvers”, Claude 2024 [humor]
gwern
Jun 23, 2024, 9:18 PM
22
points
6
comments
1
min read
LW
link
(gwern.net)
[Question]
How are you preparing for the possibility of an AI bust?
Nate Showell
Jun 23, 2024, 7:13 PM
26
points
16
comments
1
min read
LW
link
A simple text status can change something
nextcaller
Jun 23, 2024, 6:48 PM
5
points
0
comments
2
min read
LW
link
35 Interactive Learning Modules Relevant to EAs / Effective Altruism (that are all free)
spencerg
Jun 23, 2024, 5:57 PM
5
points
0
comments
LW
link
Podcasts: AGI Show, Consistently Candid, London Futurists
KatjaGrace
Jun 23, 2024, 1:50 PM
16
points
0
comments
1
min read
LW
link
(worldspiritsockpuppet.com)
Text Posts from the Kids Group: 2019
jefftk
Jun 23, 2024, 1:20 PM
23
points
0
comments
18
min read
LW
link
(www.jefftk.com)
Population ethics and the value of variety
cousin_it
Jun 23, 2024, 10:42 AM
24
points
11
comments
2
min read
LW
link
[Question]
Karma votes: blind to or accounting for score?
cata
22 Jun 2024 21:40 UTC
19
points
4
comments
1
min read
LW
link
[Question]
Should effective altruism be more “cool”?
jaredmantell
22 Jun 2024 20:42 UTC
3
points
3
comments
1
min read
LW
link
Meta Alignment: Communication Wack-a-Mole
Bridgett Kay
22 Jun 2024 20:12 UTC
16
points
2
comments
5
min read
LW
link
(dxmrevealed.wordpress.com)
AI as a computing platform: what to expect
Jonasb
22 Jun 2024 19:55 UTC
−3
points
0
comments
7
min read
LW
link
(www.denominations.io)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel