Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Revisiting the Manifold Hypothesis
Aidan Rocke
Oct 1, 2023, 11:55 PM
13
points
19
comments
4
min read
LW
link
AI Alignment Breakthroughs this Week [new substack]
Logan Zoellner
Oct 1, 2023, 10:13 PM
0
points
8
comments
2
min read
LW
link
[Question]
Looking for study
Robert Feinstein
Oct 1, 2023, 7:52 PM
4
points
0
comments
1
min read
LW
link
Join AISafety.info’s Distillation Hackathon (Oct 6-9th)
smallsilo
Oct 1, 2023, 6:43 PM
21
points
0
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Fifty Flips
abstractapplic
Oct 1, 2023, 3:30 PM
33
points
15
comments
1
min read
LW
link
1
review
(h-b-p.github.io)
AI Safety Impact Markets: Your Charity Evaluator for AI Safety
Dawn Drescher
Oct 1, 2023, 10:47 AM
16
points
5
comments
LW
link
(impactmarkets.substack.com)
“Absence of Evidence is Not Evidence of Absence” As a Limit
transhumanist_atom_understander
Oct 1, 2023, 8:15 AM
16
points
1
comment
2
min read
LW
link
New Tool: the Residual Stream Viewer
AdamYedidia
Oct 1, 2023, 12:49 AM
32
points
7
comments
4
min read
LW
link
(tinyurl.com)
My Effortless Weightloss Story: A Quick Runthrough
CuoreDiVetro
Sep 30, 2023, 11:02 PM
123
points
78
comments
9
min read
LW
link
Arguments for moral indefinability
Richard_Ngo
Sep 30, 2023, 10:40 PM
47
points
16
comments
7
min read
LW
link
(www.thinkingcomplete.com)
Conditionals All The Way Down
lunatic_at_large
Sep 30, 2023, 9:06 PM
33
points
2
comments
3
min read
LW
link
Focusing your impact on short vs long TAI timelines
kuhanj
Sep 30, 2023, 7:34 PM
4
points
0
comments
10
min read
LW
link
How model editing could help with the alignment problem
Michael Ripa
Sep 30, 2023, 5:47 PM
12
points
1
comment
15
min read
LW
link
My submission to the ALTER Prize
Lorxus
Sep 30, 2023, 4:07 PM
6
points
0
comments
1
min read
LW
link
(www.docdroid.net)
Anki deck for learning the main AI safety orgs, projects, and programs
Bryce Robertson
Sep 30, 2023, 4:06 PM
2
points
0
comments
1
min read
LW
link
The Lighthaven Campus is open for bookings
habryka
Sep 30, 2023, 1:08 AM
209
points
18
comments
4
min read
LW
link
(www.lighthaven.space)
Headphones hook
philh
Sep 29, 2023, 10:50 PM
21
points
1
comment
3
min read
LW
link
(reasonableapproximation.net)
Paul Christiano’s views on “doom” (video explainer)
Michaël Trazzi
Sep 29, 2023, 9:56 PM
15
points
0
comments
1
min read
LW
link
(youtu.be)
The Retroactive Funding Landscape: Innovations for Donors and Grantmakers
Dawn Drescher
Sep 29, 2023, 5:39 PM
13
points
0
comments
LW
link
(impactmarkets.substack.com)
Bids To Defer On Value Judgements
johnswentworth
Sep 29, 2023, 5:07 PM
58
points
6
comments
3
min read
LW
link
Announcing FAR Labs, an AI safety coworking space
Ben Goldhaber
Sep 29, 2023, 4:52 PM
95
points
0
comments
1
min read
LW
link
A tool for searching rationalist & EA webs
Daniel_Friedrich
Sep 29, 2023, 3:23 PM
4
points
0
comments
1
min read
LW
link
(ratsearch.blogspot.com)
Basic Mathematics of Predictive Coding
Adam Shai
Sep 29, 2023, 2:38 PM
49
points
6
comments
9
min read
LW
link
“Diamondoid bacteria” nanobots: deadly threat or dead-end? A nanotech investigation
titotal
Sep 29, 2023, 2:01 PM
160
points
79
comments
LW
link
(titotal.substack.com)
Steering subsystems: capabilities, agency, and alignment
Seth Herd
Sep 29, 2023, 1:45 PM
31
points
0
comments
8
min read
LW
link
Apply to Usable Security Prize by September 30
Allison Duettmann
Sep 29, 2023, 1:39 PM
4
points
0
comments
1
min read
LW
link
List of how people have become more hard-working
Chi Nguyen
Sep 29, 2023, 11:30 AM
69
points
7
comments
LW
link
Resolving moral uncertainty with randomization
B Jacobs
and
Jobst Heitzig
Sep 29, 2023, 11:23 AM
7
points
1
comment
11
min read
LW
link
EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem
Elizabeth
Sep 28, 2023, 11:30 PM
323
points
250
comments
22
min read
LW
link
2
reviews
(acesounderglass.com)
Competitive, Cooperative, and Cohabitive
Screwtape
Sep 28, 2023, 11:25 PM
49
points
13
comments
5
min read
LW
link
1
review
The Coming Wave
PeterMcCluskey
Sep 28, 2023, 10:59 PM
27
points
1
comment
6
min read
LW
link
(bayesianinvestor.com)
High-level interpretability: detecting an AI’s objectives
Paul Colognese
and
Jozdien
Sep 28, 2023, 7:30 PM
72
points
4
comments
21
min read
LW
link
How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions
JanB
,
Owain_Evans
and
SoerenMind
Sep 28, 2023, 6:53 PM
187
points
39
comments
3
min read
LW
link
1
review
Responsible scaling policy TLDR
lemonhope
Sep 28, 2023, 6:51 PM
9
points
0
comments
1
min read
LW
link
Alignment Workshop talks
Richard_Ngo
Sep 28, 2023, 6:26 PM
37
points
1
comment
1
min read
LW
link
(www.alignment-workshop.com)
My Current Thoughts on the AI Strategic Landscape
Jeffrey Heninger
Sep 28, 2023, 5:59 PM
11
points
28
comments
14
min read
LW
link
My Arrogant Plan for Alignment
MrArrogant
Sep 28, 2023, 5:51 PM
2
points
6
comments
6
min read
LW
link
Discursive Competence in ChatGPT, Part 2: Memory for Texts
Bill Benzon
Sep 28, 2023, 4:34 PM
1
point
0
comments
3
min read
LW
link
Different views of alignment have different consequences for imperfect methods
Stuart_Armstrong
Sep 28, 2023, 4:31 PM
31
points
0
comments
1
min read
LW
link
AI #31: It Can Do What Now?
Zvi
Sep 28, 2023, 4:00 PM
90
points
6
comments
40
min read
LW
link
(thezvi.wordpress.com)
The point of a game is not to win, and you shouldn’t even pretend that it is
mako yass
Sep 28, 2023, 3:54 PM
51
points
27
comments
4
min read
LW
link
(makopool.com)
Cohabitive Games so Far
mako yass
Sep 28, 2023, 3:41 PM
131
points
146
comments
19
min read
LW
link
2
reviews
(makopool.com)
Wobbly Table Theorem in Practice
Morpheus
28 Sep 2023 14:33 UTC
24
points
0
comments
2
min read
LW
link
Weighing Animal Worth
jefftk
28 Sep 2023 13:50 UTC
25
points
11
comments
2
min read
LW
link
(www.jefftk.com)
ARC Evals: Responsible Scaling Policies
Zach Stein-Perlman
28 Sep 2023 4:30 UTC
40
points
10
comments
2
min read
LW
link
1
review
(evals.alignment.org)
Petrov Day Retrospective, 2023 (re: the most important virtue of Petrov Day & unilaterally promoting it)
Ruby
28 Sep 2023 2:48 UTC
66
points
73
comments
6
min read
LW
link
Jimmy Apples, source of the rumor that OpenAI has achieved AGI internally, is a credible insider.
Jorterder
28 Sep 2023 1:20 UTC
−6
points
2
comments
1
min read
LW
link
(twitter.com)
Investigating the rumors of OpenAI achieving AGI
Jorterder
28 Sep 2023 1:17 UTC
−4
points
1
comment
1
min read
LW
link
Alibaba Group releases Qwen, 14B parameter LLM
Nikola Jurkovic
28 Sep 2023 0:12 UTC
5
points
1
comment
1
min read
LW
link
(qianwen-res.oss-cn-beijing.aliyuncs.com)
Metaculus Launches 2023/2024 FluSight Challenge Supporting CDC, $5K in Prizes
ChristianWilliams
27 Sep 2023 21:35 UTC
5
points
0
comments
LW
link
(www.metaculus.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel