Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
[Question]
Who determines whether an alignment proposal is the definitive alignment solution?
MiguelDev
Oct 3, 2023, 10:39 PM
−1
points
6
comments
1
min read
LW
link
AXRP Episode 25 - Cooperative AI with Caspar Oesterheld
DanielFilan
Oct 3, 2023, 9:50 PM
43
points
0
comments
92
min read
LW
link
When to Get the Booster?
jefftk
Oct 3, 2023, 9:00 PM
50
points
15
comments
2
min read
LW
link
(www.jefftk.com)
OpenAI-Microsoft partnership
Zach Stein-Perlman
Oct 3, 2023, 8:01 PM
51
points
19
comments
1
min read
LW
link
[Question]
Current AI safety techniques?
Zach Stein-Perlman
Oct 3, 2023, 7:30 PM
30
points
2
comments
2
min read
LW
link
Testing and Automation for Intelligent Systems.
Sai Kiran Kammari
Oct 3, 2023, 5:51 PM
−13
points
0
comments
1
min read
LW
link
(resource-cms.springernature.com)
Metaculus Announces Forecasting Tournament to Evaluate Focused Research Organizations, in Partnership With the Federation of American Scientists
ChristianWilliams
Oct 3, 2023, 4:44 PM
13
points
0
comments
LW
link
(www.metaculus.com)
What would it mean to understand how a large language model (LLM) works? Some quick notes.
Bill Benzon
Oct 3, 2023, 3:11 PM
20
points
4
comments
8
min read
LW
link
[Question]
Potential alignment targets for a sovereign superintelligent AI
Paul Colognese
Oct 3, 2023, 3:09 PM
29
points
4
comments
1
min read
LW
link
Monthly Roundup #11: October 2023
Zvi
Oct 3, 2023, 2:10 PM
42
points
12
comments
35
min read
LW
link
(thezvi.wordpress.com)
Why We Use Money? - A Walrasian View
Savio Coelho
Oct 3, 2023, 12:02 PM
4
points
3
comments
8
min read
LW
link
Mech Interp Challenge: October—Deciphering the Sorted List Model
CallumMcDougall
Oct 3, 2023, 10:57 AM
23
points
0
comments
3
min read
LW
link
Early Experiments in Reward Model Interpretation Using Sparse Autoencoders
lukemarks
,
Amirali Abdullah
,
Rauno Arike
,
Fazl
and
nothoughtsheadempty
Oct 3, 2023, 7:45 AM
17
points
0
comments
5
min read
LW
link
Some Quick Follow-Up Experiments to “Taken out of context: On measuring situational awareness in LLMs”
Miles Turpin
Oct 3, 2023, 2:22 AM
31
points
0
comments
9
min read
LW
link
My Mid-Career Transition into Biosecurity
jefftk
Oct 2, 2023, 9:20 PM
26
points
4
comments
2
min read
LW
link
(www.jefftk.com)
Dall-E 3
p.b.
Oct 2, 2023, 8:33 PM
37
points
9
comments
1
min read
LW
link
(openai.com)
Thomas Kwa’s MIRI research experience
Thomas Kwa
,
peterbarnett
,
Vivek Hebbar
,
Jeremy Gillen
,
Bird Concept
and
Raemon
Oct 2, 2023, 4:42 PM
173
points
53
comments
1
min read
LW
link
Population After a Catastrophe
Stan Pinsent
Oct 2, 2023, 4:06 PM
3
points
5
comments
14
min read
LW
link
Expectations for Gemini: hopefully not a big deal
Maxime Riché
Oct 2, 2023, 3:38 PM
15
points
5
comments
1
min read
LW
link
A counterexample for measurable factor spaces
Matthias G. Mayer
Oct 2, 2023, 3:16 PM
14
points
0
comments
3
min read
LW
link
Will early transformative AIs primarily use text? [Manifold question]
Fabien Roger
Oct 2, 2023, 3:05 PM
16
points
0
comments
3
min read
LW
link
energy landscapes of experts
bhauth
Oct 2, 2023, 2:08 PM
45
points
2
comments
3
min read
LW
link
(www.bhauth.com)
Direction of Fit
NicholasKees
Oct 2, 2023, 12:34 PM
34
points
0
comments
3
min read
LW
link
The 99% principle for personal problems
Kaj_Sotala
Oct 2, 2023, 8:20 AM
141
points
20
comments
2
min read
LW
link
(kajsotala.fi)
Linkpost: They Studied Dishonesty. Was Their Work a Lie?
Linch
Oct 2, 2023, 8:10 AM
91
points
12
comments
2
min read
LW
link
(www.newyorker.com)
Why I got the smallpox vaccine in 2023
joec
Oct 2, 2023, 5:11 AM
25
points
6
comments
4
min read
LW
link
Instrumental Convergence and human extinction.
Spiritus Dei
Oct 2, 2023, 12:41 AM
−10
points
3
comments
7
min read
LW
link
Revisiting the Manifold Hypothesis
Aidan Rocke
Oct 1, 2023, 11:55 PM
13
points
19
comments
4
min read
LW
link
AI Alignment Breakthroughs this Week [new substack]
Logan Zoellner
Oct 1, 2023, 10:13 PM
0
points
8
comments
2
min read
LW
link
[Question]
Looking for study
Robert Feinstein
Oct 1, 2023, 7:52 PM
4
points
0
comments
1
min read
LW
link
Join AISafety.info’s Distillation Hackathon (Oct 6-9th)
smallsilo
Oct 1, 2023, 6:43 PM
21
points
0
comments
2
min read
LW
link
(forum.effectivealtruism.org)
Fifty Flips
abstractapplic
Oct 1, 2023, 3:30 PM
33
points
15
comments
1
min read
LW
link
1
review
(h-b-p.github.io)
AI Safety Impact Markets: Your Charity Evaluator for AI Safety
Dawn Drescher
Oct 1, 2023, 10:47 AM
16
points
5
comments
LW
link
(impactmarkets.substack.com)
“Absence of Evidence is Not Evidence of Absence” As a Limit
transhumanist_atom_understander
Oct 1, 2023, 8:15 AM
16
points
1
comment
2
min read
LW
link
New Tool: the Residual Stream Viewer
AdamYedidia
Oct 1, 2023, 12:49 AM
32
points
7
comments
4
min read
LW
link
(tinyurl.com)
My Effortless Weightloss Story: A Quick Runthrough
CuoreDiVetro
Sep 30, 2023, 11:02 PM
123
points
78
comments
9
min read
LW
link
Arguments for moral indefinability
Richard_Ngo
Sep 30, 2023, 10:40 PM
47
points
16
comments
7
min read
LW
link
(www.thinkingcomplete.com)
Conditionals All The Way Down
lunatic_at_large
Sep 30, 2023, 9:06 PM
33
points
2
comments
3
min read
LW
link
Focusing your impact on short vs long TAI timelines
kuhanj
Sep 30, 2023, 7:34 PM
4
points
0
comments
10
min read
LW
link
How model editing could help with the alignment problem
Michael Ripa
Sep 30, 2023, 5:47 PM
12
points
1
comment
15
min read
LW
link
My submission to the ALTER Prize
Lorxus
Sep 30, 2023, 4:07 PM
6
points
0
comments
1
min read
LW
link
(www.docdroid.net)
Anki deck for learning the main AI safety orgs, projects, and programs
Bryce Robertson
Sep 30, 2023, 4:06 PM
2
points
0
comments
1
min read
LW
link
The Lighthaven Campus is open for bookings
habryka
Sep 30, 2023, 1:08 AM
209
points
18
comments
4
min read
LW
link
(www.lighthaven.space)
Headphones hook
philh
Sep 29, 2023, 10:50 PM
21
points
1
comment
3
min read
LW
link
(reasonableapproximation.net)
Paul Christiano’s views on “doom” (video explainer)
Michaël Trazzi
Sep 29, 2023, 9:56 PM
15
points
0
comments
1
min read
LW
link
(youtu.be)
The Retroactive Funding Landscape: Innovations for Donors and Grantmakers
Dawn Drescher
Sep 29, 2023, 5:39 PM
13
points
0
comments
LW
link
(impactmarkets.substack.com)
Bids To Defer On Value Judgements
johnswentworth
Sep 29, 2023, 5:07 PM
58
points
6
comments
3
min read
LW
link
Announcing FAR Labs, an AI safety coworking space
Ben Goldhaber
Sep 29, 2023, 4:52 PM
95
points
0
comments
1
min read
LW
link
A tool for searching rationalist & EA webs
Daniel_Friedrich
Sep 29, 2023, 3:23 PM
4
points
0
comments
1
min read
LW
link
(ratsearch.blogspot.com)
Basic Mathematics of Predictive Coding
Adam Shai
Sep 29, 2023, 2:38 PM
49
points
6
comments
9
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel