Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
1
hydrogen tube transport
bhauth
Apr 18, 2024, 10:47 PM
34
points
12
comments
5
min read
LW
link
(www.bhauth.com)
LessOnline Festival Updates Thread
Ben Pace
Apr 18, 2024, 9:55 PM
59
points
26
comments
1
min read
LW
link
A Review of In-Context Learning Hypotheses for Automated AI Alignment Research
alamerton
Apr 18, 2024, 6:29 PM
25
points
4
comments
16
min read
LW
link
I’m open for projects (sort of)
cousin_it
Apr 18, 2024, 6:05 PM
46
points
13
comments
1
min read
LW
link
Blessed information, garbage information, cursed information
tailcalled
Apr 18, 2024, 4:56 PM
23
points
8
comments
3
min read
LW
link
[Fiction] A Confession
Arjun Panickssery
Apr 18, 2024, 4:28 PM
38
points
2
comments
5
min read
LW
link
(arjunpanickssery.substack.com)
Discriminating Behaviorally Identical Classifiers: a model problem for applying interpretability to scalable oversight
Sam Marks
Apr 18, 2024, 4:17 PM
113
points
10
comments
12
min read
LW
link
Cooperation is optimal, with weaker agents too - tldr
Ryo
Apr 18, 2024, 3:03 PM
12
points
22
comments
4
min read
LW
link
(medium.com)
How to coordinate despite our biases? - tldr
Ryo
Apr 18, 2024, 3:03 PM
3
points
2
comments
3
min read
LW
link
(medium.com)
Knowledge Base 7: Long-tail knowledge and collective intelligence
iwis
Apr 18, 2024, 2:21 PM
−6
points
0
comments
1
min read
LW
link
AI #60: Oh the Humanity
Zvi
Apr 18, 2024, 2:10 PM
44
points
7
comments
62
min read
LW
link
(thezvi.wordpress.com)
UDT1.01: Logical Inductors and Implicit Beliefs (5/10)
Diffractor
Apr 18, 2024, 8:39 AM
34
points
2
comments
19
min read
LW
link
An examination of GPT-2′s boring yet effective glitch
MiguelDev
Apr 18, 2024, 5:26 AM
5
points
3
comments
3
min read
LW
link
[Question]
What if Ethics is Provably Self-Contradictory?
Yitz
Apr 18, 2024, 5:12 AM
3
points
7
comments
2
min read
LW
link
The Mom Test: Summary and Thoughts
Adam Zerner
Apr 18, 2024, 3:34 AM
48
points
3
comments
10
min read
LW
link
Express interest in an “FHI of the West”
habryka
Apr 18, 2024, 3:32 AM
268
points
41
comments
3
min read
LW
link
Why Would Belief-States Have A Fractal Structure, And Why Would That Matter For Interpretability? An Explainer
johnswentworth
and
David Lorell
Apr 18, 2024, 12:27 AM
185
points
21
comments
7
min read
LW
link
AXRP Episode 28 - Suing Labs for AI Risk with Gabriel Weil
DanielFilan
Apr 17, 2024, 9:42 PM
12
points
0
comments
65
min read
LW
link
LLM Evaluators Recognize and Favor Their Own Generations
Arjun Panickssery
,
Sam Bowman
and
Shi Feng
Apr 17, 2024, 9:09 PM
45
points
1
comment
3
min read
LW
link
(tiny.cc)
SFS: Foundations of Forecasting
MAD2
Apr 17, 2024, 5:46 PM
3
points
0
comments
1
min read
LW
link
An ethical framework to supersede Utilitarianism
metalcrow
Apr 17, 2024, 5:18 PM
1
point
4
comments
4
min read
LW
link
Moving on from community living
Vika
Apr 17, 2024, 5:02 PM
63
points
7
comments
3
min read
LW
link
(vkrakovna.wordpress.com)
Staged release
Zach Stein-Perlman
Apr 17, 2024, 4:00 PM
11
points
4
comments
2
min read
LW
link
[Question]
Discomfort Stacking
Lewis O’Brien
Apr 17, 2024, 2:49 PM
5
points
12
comments
1
min read
LW
link
FHI (Future of Humanity Institute) has shut down (2005–2024)
gwern
Apr 17, 2024, 1:54 PM
176
points
22
comments
1
min read
LW
link
(www.futureofhumanityinstitute.org)
Childhood and Education Roundup #5
Zvi
Apr 17, 2024, 1:00 PM
37
points
3
comments
25
min read
LW
link
(thezvi.wordpress.com)
Should we maximize the Geometric Expectation of Utility?
A.H.
Apr 17, 2024, 10:37 AM
5
points
17
comments
9
min read
LW
link
Claude 3 Opus can operate as a Turing machine
Gunnar_Zarncke
Apr 17, 2024, 8:41 AM
36
points
2
comments
1
min read
LW
link
(twitter.com)
When is a mind me?
Rob Bensinger
Apr 17, 2024, 5:56 AM
144
points
130
comments
15
min read
LW
link
Mid-conditional love
KatjaGrace
Apr 17, 2024, 4:00 AM
76
points
21
comments
2
min read
LW
link
(worldspiritsockpuppet.com)
Spending Update 2024
jefftk
Apr 17, 2024, 2:30 AM
20
points
2
comments
3
min read
LW
link
(www.jefftk.com)
Anti MMAcevedo Protocol
Logan Zoellner
Apr 16, 2024, 10:32 PM
1
point
1
comment
8
min read
LW
link
Transformers Represent Belief State Geometry in their Residual Stream
Adam Shai
Apr 16, 2024, 9:16 PM
419
points
100
comments
12
min read
LW
link
Tinker
Richard_Ngo
Apr 16, 2024, 6:26 PM
38
points
0
comments
1
min read
LW
link
(press.asimov.com)
Paul Christiano named as US AI Safety Institute Head of AI Safety
Joel Burget
Apr 16, 2024, 4:22 PM
256
points
58
comments
1
min read
LW
link
(www.commerce.gov)
Creating unrestricted AI Agents with Command R+
Simon Lermen
Apr 16, 2024, 2:52 PM
77
points
13
comments
5
min read
LW
link
What should the EA community learn from the FTX / SBF disaster? An in-depth discussion with Will MacAskill on the Clearer Thinking podcast
spencerg
Apr 16, 2024, 1:11 PM
20
points
0
comments
LW
link
(podcast.clearerthinking.org)
{Book Summary} The Art of Gathering
Tristan Williams
Apr 16, 2024, 10:48 AM
28
points
0
comments
LW
link
Essay competition on the Automation of Wisdom and Philosophy — $25k in prizes
owencb
and
AI Impacts
Apr 16, 2024, 10:10 AM
82
points
12
comments
8
min read
LW
link
(blog.aiimpacts.org)
Announcing SPAR Summer 2024!
laurenmarie12
Apr 16, 2024, 8:30 AM
30
points
2
comments
1
min read
LW
link
The argument for near-term human disempowerment through AI
Chris_Leong
Apr 16, 2024, 4:50 AM
21
points
2
comments
1
min read
LW
link
(link.springer.com)
My experience using financial commitments to overcome akrasia
William Howard
Apr 15, 2024, 10:57 PM
137
points
33
comments
18
min read
LW
link
An evaluation of circuit evaluation metrics
Iván Arcuschin
,
Niels uit de Bos
and
Adrià Garriga-alonso
Apr 15, 2024, 7:38 PM
18
points
0
comments
4
min read
LW
link
Experiments with an alternative method to promote sparsity in sparse autoencoders
Eoin Farrell
Apr 15, 2024, 6:21 PM
29
points
7
comments
12
min read
LW
link
Effectively Handling Disagreements—Introducing a New Workshop
Camille Berger
Apr 15, 2024, 4:33 PM
37
points
2
comments
7
min read
LW
link
Four Local Gigs
jefftk
Apr 15, 2024, 4:00 PM
8
points
0
comments
1
min read
LW
link
(www.jefftk.com)
Taking into account preferences of past selves
Jacob G-W
15 Apr 2024 13:15 UTC
14
points
9
comments
7
min read
LW
link
Monthly Roundup #17: April 2024
Zvi
15 Apr 2024 12:10 UTC
54
points
4
comments
76
min read
LW
link
(thezvi.wordpress.com)
Reconsider the anti-cavity bacteria if you are Asian
Lao Mein
15 Apr 2024 7:02 UTC
170
points
43
comments
4
min read
LW
link
Anthropic AI made the right call
bhauth
15 Apr 2024 0:39 UTC
22
points
20
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel