Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
Page
3
Higher-effort summer solstice: What if we used AI (i.e., Angel Island)?
Rachel Shu
Jun 25, 2024, 1:35 AM
46
points
9
comments
3
min read
LW
link
Rational Animations’ intro to mechanistic interpretability
Writer
Jun 14, 2024, 4:10 PM
45
points
1
comment
11
min read
LW
link
(youtu.be)
AI governance needs a theory of victory
Corin Katzke
and
Justin Bullock
Jun 21, 2024, 4:15 PM
45
points
8
comments
LW
link
(www.convergenceanalysis.org)
Sci-Fi books micro-reviews
Yair Halberstadt
Jun 24, 2024, 9:49 AM
44
points
27
comments
4
min read
LW
link
Debate, Oracles, and Obfuscated Arguments
Jonah Brown-Cohen
and
Geoffrey Irving
Jun 20, 2024, 11:14 PM
44
points
4
comments
21
min read
LW
link
Soviet comedy film recommendations
Nina Panickssery
Jun 9, 2024, 11:40 PM
42
points
11
comments
2
min read
LW
link
(open.substack.com)
D&D.Sci Alchemy: Archmage Anachronos and the Supply Chain Issues
aphyer
Jun 7, 2024, 7:02 PM
42
points
16
comments
3
min read
LW
link
Case studies on social-welfare-based standards in various industries
HoldenKarnofsky
Jun 20, 2024, 1:33 PM
42
points
0
comments
LW
link
When fine-tuning fails to elicit GPT-3.5′s chess abilities
Theodore Chapman
Jun 14, 2024, 6:50 PM
42
points
3
comments
9
min read
LW
link
Jailbreak steering generalization
Sarah Ball
and
Nina Panickssery
Jun 20, 2024, 5:25 PM
41
points
4
comments
2
min read
LW
link
(arxiv.org)
Book review: The Quincunx
cousin_it
Jun 5, 2024, 9:13 PM
41
points
12
comments
2
min read
LW
link
Surviving Seveneves
Yair Halberstadt
Jun 19, 2024, 1:11 PM
41
points
4
comments
11
min read
LW
link
Applying Force to the Wrong End of a Causal Chain
silentbob
Jun 22, 2024, 6:06 PM
41
points
0
comments
9
min read
LW
link
Beyond the Board: Exploring AI Robustness Through Go
AdamGleave
Jun 19, 2024, 4:40 PM
41
points
2
comments
1
min read
LW
link
(far.ai)
Long-Term Future Fund: May 2023 to March 2024 Payout recommendations
Linch
Jun 12, 2024, 1:46 PM
40
points
0
comments
LW
link
Progress Conference 2024: Toward Abundant Futures
jasoncrawford
Jun 26, 2024, 3:39 PM
40
points
2
comments
1
min read
LW
link
(rootsofprogress.org)
The Data Wall is Important
JustisMills
Jun 9, 2024, 10:54 PM
40
points
20
comments
2
min read
LW
link
(justismills.substack.com)
Is This Lie Detector Really Just a Lie Detector? An Investigation of LLM Probe Specificity.
Josh Levy
Jun 4, 2024, 3:45 PM
39
points
0
comments
18
min read
LW
link
AI #70: A Beautiful Sonnet
Zvi
Jun 27, 2024, 2:40 PM
38
points
0
comments
44
min read
LW
link
(thezvi.wordpress.com)
(Appetitive, Consummatory) ≈ (RL, reflex)
Steven Byrnes
Jun 15, 2024, 3:57 PM
38
points
1
comment
3
min read
LW
link
On DeepMind’s Frontier Safety Framework
Zvi
Jun 18, 2024, 1:30 PM
37
points
4
comments
8
min read
LW
link
(thezvi.wordpress.com)
Searching for the Root of the Tree of Evil
Ivan Vendrov
Jun 8, 2024, 5:05 PM
36
points
14
comments
5
min read
LW
link
(nothinghuman.substack.com)
Representation Tuning
Christopher Ackerman
Jun 27, 2024, 5:44 PM
35
points
9
comments
13
min read
LW
link
Empirical vs. Mathematical Joints of Nature
Elizabeth
and
Alex_Altair
Jun 26, 2024, 1:55 AM
35
points
1
comment
5
min read
LW
link
OpenAI appoints Retired U.S. Army General Paul M. Nakasone to Board of Directors
Joel Burget
Jun 13, 2024, 9:28 PM
35
points
10
comments
1
min read
LW
link
(openai.com)
Suffering Is Not Pain
jbkjr
Jun 18, 2024, 6:04 PM
34
points
45
comments
5
min read
LW
link
(jbkjr.me)
GPT2, Five Years On
Joel Burget
Jun 5, 2024, 5:44 PM
34
points
0
comments
3
min read
LW
link
(importai.substack.com)
AXRP Episode 33 - RLHF Problems with Scott Emmons
DanielFilan
Jun 12, 2024, 3:30 AM
34
points
0
comments
56
min read
LW
link
Attention Output SAEs Improve Circuit Analysis
Connor Kissane
,
robertzk
,
Arthur Conmy
and
Neel Nanda
Jun 21, 2024, 12:56 PM
33
points
3
comments
19
min read
LW
link
Book review: the Iliad
philh
Jun 18, 2024, 6:50 PM
31
points
2
comments
14
min read
LW
link
(reasonableapproximation.net)
Incentive Learning vs Dead Sea Salt Experiment
Steven Byrnes
Jun 25, 2024, 5:49 PM
30
points
1
comment
28
min read
LW
link
5. Open Corrigibility Questions
Max Harms
Jun 10, 2024, 2:09 PM
30
points
0
comments
7
min read
LW
link
“Full Automation” is a Slippery Metric
ozziegooen
Jun 11, 2024, 7:56 PM
30
points
1
comment
LW
link
[Question]
What are things you’re allowed to do as a startup?
Elizabeth
Jun 20, 2024, 12:01 AM
30
points
9
comments
1
min read
LW
link
A Case for Superhuman Governance, using AI
ozziegooen
Jun 7, 2024, 12:10 AM
30
points
0
comments
LW
link
DPO/PPO-RLHF on LLMs incentivizes sycophancy, exaggeration and deceptive hallucination, but not misaligned powerseeking
tailcalled
Jun 10, 2024, 9:20 PM
29
points
13
comments
2
min read
LW
link
Aggregative Principles of Social Justice
Cleo Nardo
Jun 5, 2024, 1:44 PM
29
points
10
comments
37
min read
LW
link
Offering Completion
jefftk
Jun 7, 2024, 1:40 AM
29
points
6
comments
1
min read
LW
link
(www.jefftk.com)
Evaporation of improvements
Viliam
Jun 20, 2024, 6:34 PM
29
points
27
comments
2
min read
LW
link
Childhood and Education Roundup #6: College Edition
Zvi
Jun 26, 2024, 11:40 AM
28
points
8
comments
23
min read
LW
link
(thezvi.wordpress.com)
Aggregative principles approximate utilitarian principles
Cleo Nardo
Jun 12, 2024, 4:27 PM
28
points
3
comments
23
min read
LW
link
Monthly Roundup #19: June 2024
Zvi
Jun 25, 2024, 12:00 PM
28
points
9
comments
54
min read
LW
link
(thezvi.wordpress.com)
Probably Not a Ghost Story
George Ingebretsen
Jun 12, 2024, 10:55 PM
27
points
4
comments
3
min read
LW
link
Appraising aggregativism and utilitarianism
Cleo Nardo
Jun 21, 2024, 11:10 PM
27
points
10
comments
19
min read
LW
link
An Intuitive Explanation of Sparse Autoencoders for Mechanistic Interpretability of LLMs
Adam Karvonen
Jun 25, 2024, 3:57 PM
27
points
0
comments
9
min read
LW
link
(adamkarvonen.github.io)
Sticker Shortcut Fallacy — The Real Worst Argument in the World
ymeskhout
Jun 12, 2024, 2:52 PM
27
points
15
comments
4
min read
LW
link
(www.ymeskhout.com)
my favourite Scott Sumner blog posts
DMMF
Jun 11, 2024, 2:40 PM
26
points
0
comments
3
min read
LW
link
(danfrank.ca)
[Question]
Thoughts on Francois Chollet’s belief that LLMs are far away from AGI?
O O
Jun 14, 2024, 6:32 AM
26
points
17
comments
1
min read
LW
link
Talk: AI safety fieldbuilding at MATS
Ryan Kidd
Jun 23, 2024, 11:06 PM
26
points
2
comments
10
min read
LW
link
3b. Formal (Faux) Corrigibility
Max Harms
Jun 9, 2024, 5:18 PM
26
points
13
comments
17
min read
LW
link
Back to first
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel