Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
2
Why deceptive alignment matters for AGI safety
Marius Hobbhahn
Sep 15, 2022, 1:38 PM
68
points
13
comments
13
min read
LW
link
Path dependence in ML inductive biases
Vivek Hebbar
and
evhub
Sep 10, 2022, 1:38 AM
68
points
13
comments
10
min read
LW
link
Self-Control Secrets of the Puritan Masters
David Hugh-Jones
Sep 26, 2022, 9:04 AM
67
points
3
comments
5
min read
LW
link
(wyclif.substack.com)
Quintin’s alignment papers roundup—week 2
Quintin Pope
Sep 19, 2022, 1:41 PM
67
points
2
comments
10
min read
LW
link
LOVE in a simbox is all you need
jacob_cannell
Sep 28, 2022, 6:25 PM
66
points
73
comments
44
min read
LW
link
1
review
Where I currently disagree with Ryan Greenblatt’s version of the ELK approach
So8res
Sep 29, 2022, 9:18 PM
65
points
7
comments
5
min read
LW
link
Book review: “The Heart of the Brain: The Hypothalamus and Its Hormones”
Steven Byrnes
Sep 27, 2022, 1:20 PM
65
points
3
comments
18
min read
LW
link
A game of mattering
KatjaGrace
Sep 23, 2022, 2:30 AM
64
points
7
comments
5
min read
LW
link
(worldspiritsockpuppet.com)
Clarifying the Agent-Like Structure Problem
johnswentworth
Sep 29, 2022, 9:28 PM
63
points
17
comments
6
min read
LW
link
[Closed] Prize and fast track to alignment research at ALTER
Vanessa Kosoy
Sep 17, 2022, 4:58 PM
63
points
8
comments
3
min read
LW
link
Private alignment research sharing and coordination
porby
Sep 4, 2022, 12:01 AM
62
points
13
comments
5
min read
LW
link
Infra-Exercises, Part 1
Diffractor
,
Jack Parker
and
Connall Garrod
Sep 1, 2022, 5:06 AM
62
points
10
comments
1
min read
LW
link
Review of Examine.com’s vitamin write-ups
Elizabeth
and
Martin Bernstorff
Sep 26, 2022, 11:40 PM
60
points
1
comment
5
min read
LW
link
(acesounderglass.com)
Gradient Hacker Design Principles From Biology
johnswentworth
Sep 1, 2022, 7:03 PM
60
points
13
comments
3
min read
LW
link
Fake qualities of mind
Kaj_Sotala
Sep 22, 2022, 4:40 PM
59
points
2
comments
2
min read
LW
link
(kajsotala.fi)
Argument against 20% GDP growth from AI within 10 years [Linkpost]
aog
Sep 12, 2022, 4:08 AM
59
points
20
comments
5
min read
LW
link
(twitter.com)
Replacement for PONR concept
Daniel Kokotajlo
Sep 2, 2022, 12:09 AM
59
points
6
comments
2
min read
LW
link
Levelling Up in AI Safety Research Engineering
Gabe M
Sep 2, 2022, 4:59 AM
58
points
9
comments
17
min read
LW
link
QAPR 3: interpretability-guided training of neural nets
Quintin Pope
Sep 28, 2022, 4:02 PM
58
points
2
comments
10
min read
LW
link
Deep Q-Networks Explained
Jay Bailey
Sep 13, 2022, 12:01 PM
58
points
8
comments
20
min read
LW
link
Two reasons we might be closer to solving alignment than it seems
KatWoods
and
AmberDawn
Sep 24, 2022, 8:00 PM
57
points
9
comments
4
min read
LW
link
Why was progress so slow in the past?
jasoncrawford
Sep 1, 2022, 8:26 PM
54
points
31
comments
6
min read
LW
link
(rootsofprogress.org)
Methodological Therapy: An Agenda For Tackling Research Bottlenecks
adamShimi
,
Lucas Teixeira
and
remember
Sep 22, 2022, 6:41 PM
54
points
6
comments
9
min read
LW
link
We may be able to see sharp left turns coming
Ethan Perez
and
Neel Nanda
Sep 3, 2022, 2:55 AM
54
points
29
comments
1
min read
LW
link
First we shape our social graph; then it shapes us
Henrik Karlsson
Sep 7, 2022, 3:50 PM
53
points
6
comments
8
min read
LW
link
(escapingflatland.substack.com)
Many therapy schools work with inner multiplicity (not just IFS)
David Althaus
and
Ewelina Tur
Sep 17, 2022, 10:27 AM
52
points
16
comments
18
min read
LW
link
Coordinate-Free Interpretability Theory
johnswentworth
Sep 14, 2022, 11:33 PM
52
points
17
comments
5
min read
LW
link
ACT-1: Transformer for Actions
Daniel Kokotajlo
Sep 14, 2022, 7:09 PM
52
points
4
comments
1
min read
LW
link
(www.adept.ai)
When does technical work to reduce AGI conflict make a difference?: Introduction
JesseClifton
,
Sammy Martin
and
Anthony DiGiovanni
Sep 14, 2022, 7:38 PM
52
points
3
comments
6
min read
LW
link
When would AGIs engage in conflict?
JesseClifton
,
Sammy Martin
and
Anthony DiGiovanni
Sep 14, 2022, 7:38 PM
52
points
5
comments
13
min read
LW
link
EA & LW Forums Weekly Summary (28 Aug − 3 Sep 22’)
Zoe Williams
Sep 6, 2022, 11:06 AM
51
points
2
comments
14
min read
LW
link
My Thoughts on the ML Safety Course
zeshen
Sep 27, 2022, 1:15 PM
50
points
3
comments
17
min read
LW
link
Some notes on solving hard problems
Joe Rocca
Sep 19, 2022, 12:58 PM
50
points
8
comments
29
min read
LW
link
Dan Luu on Futurist Predictions
RobertM
Sep 14, 2022, 3:01 AM
50
points
9
comments
5
min read
LW
link
(danluu.com)
Soft skills for meetups
mingyuan
Sep 27, 2022, 5:26 PM
49
points
3
comments
5
min read
LW
link
Brief Notes on Transformers
Adam Jermyn
Sep 26, 2022, 2:46 PM
48
points
3
comments
2
min read
LW
link
Understanding and avoiding value drift
TurnTrout
Sep 9, 2022, 4:16 AM
48
points
14
comments
6
min read
LW
link
Covid 9/29/22: The Jones Act Waver
Zvi
Sep 29, 2022, 6:20 PM
47
points
10
comments
24
min read
LW
link
(thezvi.wordpress.com)
A Library and Tutorial for Factored Cognition with Language Models
stuhlmueller
,
justin_dan
and
goodgravy
Sep 28, 2022, 6:15 PM
47
points
0
comments
1
min read
LW
link
Scraping training data for your mind
Henrik Karlsson
Sep 21, 2022, 4:27 PM
47
points
4
comments
8
min read
LW
link
(escapingflatland.substack.com)
Estimating the Current and Future Number of AI Safety Researchers
Stephen McAleese
Sep 28, 2022, 9:11 PM
47
points
14
comments
9
min read
LW
link
(forum.effectivealtruism.org)
Prize idea: Transmit MIRI and Eliezer’s worldviews
elifland
Sep 19, 2022, 9:21 PM
47
points
18
comments
2
min read
LW
link
[An email with a bunch of links I sent an experienced ML researcher interested in learning about Alignment / x-safety.]
David Scott Krueger (formerly: capybaralet)
Sep 8, 2022, 10:28 PM
47
points
1
comment
5
min read
LW
link
Pretending not to Notice
jefftk
Sep 19, 2022, 2:30 AM
46
points
12
comments
2
min read
LW
link
(www.jefftk.com)
AI Safety field-building projects I’d like to see
Orpheus16
Sep 11, 2022, 11:43 PM
46
points
8
comments
6
min read
LW
link
AI Risk Intro 1: Advanced AI Might Be Very Bad
CallumMcDougall
and
L Rudolf L
Sep 11, 2022, 10:57 AM
46
points
13
comments
30
min read
LW
link
Alignment via prosocial brain algorithms
Cameron Berg
Sep 12, 2022, 1:48 PM
45
points
30
comments
6
min read
LW
link
It matters when the first sharp left turn happens
Adam Jermyn
Sep 29, 2022, 8:12 PM
45
points
9
comments
4
min read
LW
link
Samotsvety’s AI risk forecasts
elifland
Sep 9, 2022, 4:01 AM
44
points
0
comments
4
min read
LW
link
Searching for Modularity in Large Language Models
NickyP
and
Stephen Fowler
Sep 8, 2022, 2:25 AM
44
points
3
comments
14
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel