Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
Page
1
Brute Force Manufactured Consensus is Hiding the Crime of the Century
Roko
Feb 3, 2024, 8:36 PM
209
points
156
comments
9
min read
LW
link
My thoughts on the Beff Jezos—Connor Leahy debate
kwiat.dev
Feb 3, 2024, 7:47 PM
−5
points
23
comments
4
min read
LW
link
The Journal of Dangerous Ideas
rogersbacon
Feb 3, 2024, 3:40 PM
−25
points
4
comments
5
min read
LW
link
(www.secretorum.life)
Attitudes about Applied Rationality
Camille Berger
Feb 3, 2024, 2:42 PM
108
points
18
comments
4
min read
LW
link
Practicing my Handwriting in 1439
Maxwell Tabarrok
Feb 3, 2024, 1:21 PM
11
points
0
comments
3
min read
LW
link
(www.maximum-progress.com)
Finite Factored Sets to Bayes Nets Part 2
J Bostock
Feb 3, 2024, 12:25 PM
6
points
0
comments
8
min read
LW
link
Why I no longer identify as transhumanist
Kaj_Sotala
Feb 3, 2024, 12:00 PM
55
points
33
comments
3
min read
LW
link
(kajsotala.fi)
Attention SAEs Scale to GPT-2 Small
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Feb 3, 2024, 6:50 AM
78
points
4
comments
8
min read
LW
link
Why do we need RLHF? Imitation, Inverse RL, and the role of reward
Ran W
Feb 3, 2024, 4:00 AM
16
points
0
comments
5
min read
LW
link
Announcing the London Initiative for Safe AI (LISA)
James Fox
,
mike_safeAI
and
Ryan Kidd
Feb 2, 2024, 11:17 PM
98
points
0
comments
9
min read
LW
link
Survey for alignment researchers!
Cameron Berg
,
Judd Rosenblatt
and
AE Studio
Feb 2, 2024, 8:41 PM
71
points
11
comments
1
min read
LW
link
Voting Results for the 2022 Review
Ben Pace
Feb 2, 2024, 8:34 PM
57
points
3
comments
73
min read
LW
link
On Dwarkesh’s 3rd Podcast With Tyler Cowen
Zvi
Feb 2, 2024, 7:30 PM
36
points
9
comments
21
min read
LW
link
(thezvi.wordpress.com)
Most experts believe COVID-19 was probably not a lab leak
DanielFilan
Feb 2, 2024, 7:28 PM
66
points
89
comments
2
min read
LW
link
(gcrinstitute.org)
What Failure Looks Like is not an existential risk (and alignment is not the solution)
otto.barten
Feb 2, 2024, 6:59 PM
13
points
12
comments
9
min read
LW
link
Solving alignment isn’t enough for a flourishing future
mic
Feb 2, 2024, 6:23 PM
27
points
0
comments
LW
link
(papers.ssrn.com)
Manifold Markets
PeterMcCluskey
Feb 2, 2024, 5:48 PM
26
points
9
comments
4
min read
LW
link
(bayesianinvestor.com)
Types of subjective welfare
MichaelStJules
Feb 2, 2024, 9:56 AM
10
points
3
comments
LW
link
Open Source Sparse Autoencoders for all Residual Stream Layers of GPT2-Small
Joseph Bloom
Feb 2, 2024, 6:54 AM
103
points
37
comments
15
min read
LW
link
Soft Prompts for Evaluation: Measuring Conditional Distance of Capabilities
porby
Feb 2, 2024, 5:49 AM
47
points
1
comment
4
min read
LW
link
(1drv.ms)
Running a Prediction Market Mafia Game
Arjun Panickssery
Feb 1, 2024, 11:24 PM
22
points
5
comments
1
min read
LW
link
(arjunpanickssery.substack.com)
Evaluating Stability of Unreflective Alignment
james.lucassen
Feb 1, 2024, 10:15 PM
57
points
12
comments
18
min read
LW
link
(jlucassen.com)
Davidad’s Provably Safe AI Architecture—ARIA’s Programme Thesis
simeon_c
Feb 1, 2024, 9:30 PM
69
points
17
comments
1
min read
LW
link
(www.aria.org.uk)
Alignment has a Basin of Attraction: Beyond the Orthogonality Thesis
RogerDearnaley
Feb 1, 2024, 9:15 PM
16
points
15
comments
13
min read
LW
link
Wrong answer bias
lemonhope
Feb 1, 2024, 8:05 PM
78
points
23
comments
1
min read
LW
link
On Not Requiring Vaccination
jefftk
Feb 1, 2024, 7:20 PM
31
points
21
comments
1
min read
LW
link
(www.jefftk.com)
The economy is mostly newbs (strat predictions)
lemonhope
Feb 1, 2024, 7:15 PM
27
points
6
comments
2
min read
LW
link
Managing risks while trying to do good
Wei Dai
Feb 1, 2024, 6:08 PM
63
points
26
comments
LW
link
Putting multimodal LLMs to the Tetris test
Lovre
and
gabrielagc
Feb 1, 2024, 4:02 PM
30
points
5
comments
7
min read
LW
link
AI #49: Bioweapon Testing Begins
Zvi
Feb 1, 2024, 3:30 PM
37
points
11
comments
42
min read
LW
link
(thezvi.wordpress.com)
Some Notes on Ethics
Pareto Optimal
Feb 1, 2024, 10:18 AM
−3
points
0
comments
1
min read
LW
link
(paretooptimal.substack.com)
Increasingly vague interpersonal welfare comparisons
MichaelStJules
Feb 1, 2024, 6:45 AM
5
points
0
comments
LW
link
PIBBSS Speaker events comings up in February
DusanDNesic
,
Nora_Ammann
and
Lucas Teixeira
Feb 1, 2024, 3:28 AM
10
points
2
comments
1
min read
LW
link
Drone Wars Endgame
RussellThor
Feb 1, 2024, 2:30 AM
36
points
71
comments
8
min read
LW
link
Sequencing Swabs
jefftk
Feb 1, 2024, 1:50 AM
19
points
1
comment
5
min read
LW
link
(www.jefftk.com)
Leading The Parade
johnswentworth
Jan 31, 2024, 10:39 PM
148
points
31
comments
9
min read
LW
link
Proposal for an AI Safety Prize
sweenesm
Jan 31, 2024, 6:35 PM
3
points
0
comments
2
min read
LW
link
Literally Everything is Infinite
Spiral
Jan 31, 2024, 6:31 PM
−9
points
8
comments
5
min read
LW
link
What fuels your ambition?
Cissy
Jan 31, 2024, 6:30 PM
29
points
1
comment
5
min read
LW
link
(www.moremyself.xyz)
“Genlangs” and Zipf’s Law: Do languages generated by ChatGPT statistically look human?
Justin-Diamond
Jan 31, 2024, 6:30 PM
2
points
2
comments
1
min read
LW
link
(arxiv.org)
AI, Intellectual Property, and the Techno-Optimist Revolution
Justin-Diamond
Jan 31, 2024, 6:30 PM
1
point
0
comments
1
min read
LW
link
(www.researchgate.net)
My Alignment “Plan”: Avoid Strong Optimisation and Align Economy
VojtaKovarik
Jan 31, 2024, 5:03 PM
24
points
9
comments
7
min read
LW
link
Where freedom comes from
Logan Kieller
Jan 31, 2024, 4:53 PM
−5
points
1
comment
3
min read
LW
link
(logankieller.substack.com)
Per protocol analysis as medical malpractice
braces
Jan 31, 2024, 4:22 PM
53
points
8
comments
1
min read
LW
link
Adam Smith Meets AI Doomers
James_Miller
Jan 31, 2024, 3:53 PM
34
points
10
comments
5
min read
LW
link
Ten Modes of Culture War Discourse
jchan
Jan 31, 2024, 1:58 PM
54
points
15
comments
15
min read
LW
link
Without Fundamental Advances, Rebellion and Coup d’État are the Inevitable Outcomes of Dictators & Monarchs Trying to Control Large, Capable Countries
Roko
Jan 31, 2024, 10:14 AM
27
points
34
comments
1
min read
LW
link
Explaining Impact Markets
Saul Munn
Jan 31, 2024, 9:51 AM
95
points
2
comments
3
min read
LW
link
(www.brasstacks.blog)
Exploring OpenAI’s Latent Directions: Tests, Observations, and Poking Around
Johnny Lin
Jan 31, 2024, 6:01 AM
26
points
4
comments
14
min read
LW
link
Clip keys together with tiny carabiners
Brendan Long
Jan 31, 2024, 4:26 AM
11
points
5
comments
1
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel