Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
“Absence of Evidence is Not Evidence of Absence” As a Limit
transhumanist_atom_understander
Oct 1, 2023, 8:15 AM
16
points
1
comment
2
min read
LW
link
New Tool: the Residual Stream Viewer
AdamYedidia
Oct 1, 2023, 12:49 AM
32
points
7
comments
4
min read
LW
link
(tinyurl.com)
My Effortless Weightloss Story: A Quick Runthrough
CuoreDiVetro
Sep 30, 2023, 11:02 PM
124
points
78
comments
9
min read
LW
link
Arguments for moral indefinability
Richard_Ngo
Sep 30, 2023, 10:40 PM
47
points
16
comments
7
min read
LW
link
(www.thinkingcomplete.com)
Conditionals All The Way Down
lunatic_at_large
Sep 30, 2023, 9:06 PM
33
points
2
comments
3
min read
LW
link
Focusing your impact on short vs long TAI timelines
kuhanj
Sep 30, 2023, 7:34 PM
4
points
0
comments
10
min read
LW
link
How model editing could help with the alignment problem
Michael Ripa
Sep 30, 2023, 5:47 PM
12
points
1
comment
15
min read
LW
link
My submission to the ALTER Prize
Lorxus
Sep 30, 2023, 4:07 PM
6
points
0
comments
1
min read
LW
link
(www.docdroid.net)
Anki deck for learning the main AI safety orgs, projects, and programs
Bryce Robertson
Sep 30, 2023, 4:06 PM
2
points
0
comments
1
min read
LW
link
The Lighthaven Campus is open for bookings
habryka
Sep 30, 2023, 1:08 AM
209
points
18
comments
4
min read
LW
link
(www.lighthaven.space)
Headphones hook
philh
Sep 29, 2023, 10:50 PM
21
points
1
comment
3
min read
LW
link
(reasonableapproximation.net)
Paul Christiano’s views on “doom” (video explainer)
Michaël Trazzi
Sep 29, 2023, 9:56 PM
15
points
0
comments
1
min read
LW
link
(youtu.be)
The Retroactive Funding Landscape: Innovations for Donors and Grantmakers
Dawn Drescher
Sep 29, 2023, 5:39 PM
13
points
0
comments
LW
link
(impactmarkets.substack.com)
Bids To Defer On Value Judgements
johnswentworth
Sep 29, 2023, 5:07 PM
58
points
6
comments
3
min read
LW
link
Announcing FAR Labs, an AI safety coworking space
Ben Goldhaber
Sep 29, 2023, 4:52 PM
95
points
0
comments
1
min read
LW
link
A tool for searching rationalist & EA webs
Daniel_Friedrich
Sep 29, 2023, 3:23 PM
4
points
0
comments
1
min read
LW
link
(ratsearch.blogspot.com)
Basic Mathematics of Predictive Coding
Adam Shai
Sep 29, 2023, 2:38 PM
49
points
6
comments
9
min read
LW
link
“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
titotal
Sep 29, 2023, 2:01 PM
160
points
79
comments
LW
link
(titotal.substack.com)
Steering subsystems: capabilities, agency, and alignment
Seth Herd
Sep 29, 2023, 1:45 PM
31
points
0
comments
8
min read
LW
link
Apply to Usable Security Prize by September 30
Allison Duettmann
Sep 29, 2023, 1:39 PM
4
points
0
comments
1
min read
LW
link
List of how people have become more hard-working
Chi Nguyen
Sep 29, 2023, 11:30 AM
69
points
7
comments
LW
link
Resolving moral uncertainty with randomization
B Jacobs
and
Jobst Heitzig
Sep 29, 2023, 11:23 AM
7
points
1
comment
11
min read
LW
link
EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
Elizabeth
Sep 28, 2023, 11:30 PM
323
points
250
comments
22
min read
LW
link
2
reviews
(acesounderglass.com)
Competitive, Cooperative, and Cohabitive
Screwtape
Sep 28, 2023, 11:25 PM
49
points
13
comments
5
min read
LW
link
1
review
The Coming Wave
PeterMcCluskey
Sep 28, 2023, 10:59 PM
27
points
1
comment
6
min read
LW
link
(bayesianinvestor.com)
High-level interpretability: detecting an AI’s objectives
Paul Colognese
and
Jozdien
Sep 28, 2023, 7:30 PM
72
points
4
comments
21
min read
LW
link
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB
,
Owain_Evans
and
SoerenMind
Sep 28, 2023, 6:53 PM
187
points
39
comments
3
min read
LW
link
1
review
Responsible scaling policy TLDR
lemonhope
Sep 28, 2023, 6:51 PM
9
points
0
comments
1
min read
LW
link
Alignment Workshop talks
Richard_Ngo
Sep 28, 2023, 6:26 PM
37
points
1
comment
1
min read
LW
link
(www.alignment-workshop.com)
My Current Thoughts on the AI Strategic Landscape
Jeffrey Heninger
Sep 28, 2023, 5:59 PM
11
points
28
comments
14
min read
LW
link
My Arrogant Plan for Alignment
MrArrogant
Sep 28, 2023, 5:51 PM
2
points
6
comments
6
min read
LW
link
Discursive Competence in ChatGPT, Part 2: Memory for Texts
Bill Benzon
Sep 28, 2023, 4:34 PM
1
point
0
comments
3
min read
LW
link
Different views of alignment have different consequences for imperfect methods
Stuart_Armstrong
Sep 28, 2023, 4:31 PM
31
points
0
comments
1
min read
LW
link
AI #31: It Can Do What Now?
Zvi
Sep 28, 2023, 4:00 PM
90
points
6
comments
40
min read
LW
link
(thezvi.wordpress.com)
The point of a game is not to win, and you shouldn’t even pretend that it is
mako yass
Sep 28, 2023, 3:54 PM
51
points
27
comments
4
min read
LW
link
(makopool.com)
Cohabitive Games so Far
mako yass
Sep 28, 2023, 3:41 PM
131
points
146
comments
19
min read
LW
link
2
reviews
(makopool.com)
Wobbly Table Theorem in Practice
Morpheus
Sep 28, 2023, 2:33 PM
24
points
0
comments
2
min read
LW
link
Weighing Animal Worth
jefftk
Sep 28, 2023, 1:50 PM
25
points
11
comments
2
min read
LW
link
(www.jefftk.com)
ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman
Sep 28, 2023, 4:30 AM
40
points
10
comments
2
min read
LW
link
1
review
(evals.alignment.org)
Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)
Ruby
Sep 28, 2023, 2:48 AM
66
points
73
comments
6
min read
LW
link
Jimmy Apples, source of the rumor that OpenAI has achieved AGI internally, is a credible insider.
Jorterder
Sep 28, 2023, 1:20 AM
−6
points
2
comments
1
min read
LW
link
(twitter.com)
Investigating the rumors of OpenAI achieving AGI
Jorterder
28 Sep 2023 1:17 UTC
−4
points
1
comment
1
min read
LW
link
Alibaba Group releases Qwen, 14B parameter LLM
Nikola Jurkovic
28 Sep 2023 0:12 UTC
5
points
1
comment
1
min read
LW
link
(qianwen-res.oss-cn-beijing.aliyuncs.com)
Metaculus Launches 2023/2024 FluSight Challenge Supporting CDC, $5K in Prizes
ChristianWilliams
27 Sep 2023 21:35 UTC
5
points
0
comments
LW
link
(www.metaculus.com)
Projects I would like to see (possibly at AI Safety Camp)
Linda Linsefors
27 Sep 2023 21:27 UTC
22
points
12
comments
4
min read
LW
link
Towards Better Milestones for Monitoring AI Capabilities
snewman
27 Sep 2023 21:18 UTC
11
points
0
comments
14
min read
LW
link
[Question]
Is Bjorn Lomborg roughly right about climate change policy?
yhoiseth
27 Sep 2023 20:06 UTC
29
points
14
comments
2
min read
LW
link
(www.sciencedirect.com)
Commonsense Good, Creative Good
jefftk
27 Sep 2023 19:50 UTC
44
points
11
comments
3
min read
LW
link
(www.jefftk.com)
Petrov Day [Spoiler Warning]
lsusr
27 Sep 2023 19:20 UTC
6
points
6
comments
1
min read
LW
link
The Hidden Complexity of Wishes—The Animation
Writer
27 Sep 2023 17:59 UTC
33
points
0
comments
1
min read
LW
link
(youtu.be)
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel