Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
2
Letting Kids Be Kids
Zvi
May 30, 2025, 10:50 AM
71
points
14
comments
20
min read
LW
link
(thezvi.wordpress.com)
Regarding South Africa
Zvi
May 16, 2025, 4:10 PM
71
points
5
comments
11
min read
LW
link
(thezvi.wordpress.com)
Claude 4
Zach Stein-Perlman
May 22, 2025, 5:00 PM
71
points
24
comments
1
min read
LW
link
(www.anthropic.com)
Better Air Purifiers
jefftk
May 11, 2025, 4:50 PM
71
points
21
comments
3
min read
LW
link
(www.jefftk.com)
Negative Results on Group SAEs
Josh Engels
May 6, 2025, 9:49 PM
70
points
3
comments
8
min read
LW
link
Tsinghua paper: Does RL Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Thomas Kwa
May 5, 2025, 6:56 PM
68
points
21
comments
2
min read
LW
link
(arxiv.org)
Learning (more) from horse employment history
Tim H
May 23, 2025, 2:11 AM
68
points
13
comments
5
min read
LW
link
Requiem for the hopes of a pre-AI world
Mitchell_Porter
May 27, 2025, 2:47 PM
68
points
0
comments
3
min read
LW
link
That’s Not How Epigenetic Modifications Work
johnswentworth
May 24, 2025, 12:15 AM
67
points
12
comments
2
min read
LW
link
What Does It Mean to “Write Like You Talk”?
Arjun Panickssery
May 15, 2025, 9:49 AM
67
points
8
comments
5
min read
LW
link
(arjunpanickssery.substack.com)
Working through a small tiling result
James Payor
May 13, 2025, 8:28 PM
66
points
9
comments
5
min read
LW
link
OpenAI Claims Nonprofit Will Retain Nominal Control
Zvi
May 7, 2025, 7:40 PM
65
points
4
comments
11
min read
LW
link
(thezvi.wordpress.com)
Interest In Conflict Is Instrumentally Convergent
Screwtape
May 9, 2025, 2:16 AM
65
points
58
comments
10
min read
LW
link
CFAR is running an experimental mini-workshop (June 2-6, Berkeley CA)!
Davis_Kingsley
May 29, 2025, 10:02 PM
64
points
2
comments
2
min read
LW
link
Beware the Moral Homophone
ymeskhout
May 27, 2025, 12:06 PM
63
points
4
comments
9
min read
LW
link
(www.ymeskhout.com)
Semen and Semantics: Understanding Porn with Language Embeddings
future_detective
May 19, 2025, 3:39 PM
63
points
27
comments
6
min read
LW
link
(github.com)
Things I Learned Making The SB-1047 Documentary
Michaël Trazzi
May 12, 2025, 5:41 PM
63
points
2
comments
2
min read
LW
link
Do you even have a system prompt? (PSA / repo)
Croissanthology
May 29, 2025, 6:49 PM
62
points
48
comments
2
min read
LW
link
Zuckerberg’s Dystopian AI Vision
Zvi
May 6, 2025, 1:50 PM
61
points
7
comments
11
min read
LW
link
(thezvi.wordpress.com)
Incorrect Baseline Evaluations Call into Question Recent LLM-RL Claims
shash42
May 29, 2025, 6:40 PM
61
points
5
comments
1
min read
LW
link
(safe-lip-9a8.notion.site)
Outcomes of the Geopolitical Singularity
Nikola Jurkovic
May 20, 2025, 6:09 PM
61
points
5
comments
5
min read
LW
link
OpenAI Preparedness Framework 2.0
Zvi
May 2, 2025, 1:10 PM
60
points
1
comment
23
min read
LW
link
(thezvi.wordpress.com)
Superhuman Coders in AI 2027 - Not So Fast
dschwarz
and
FutureSearch
May 1, 2025, 6:56 PM
59
points
0
comments
5
min read
LW
link
Why I am not a successionist
Nina Panickssery
May 4, 2025, 7:08 PM
59
points
48
comments
2
min read
LW
link
(ninapanickssery.substack.com)
Highly Opinionated Advice on How to Write ML Papers
Neel Nanda
May 12, 2025, 1:59 AM
59
points
4
comments
32
min read
LW
link
October The First Is Too Late
gwern
May 13, 2025, 9:45 PM
58
points
8
comments
1
min read
LW
link
(gwern.net)
New website analyzing AI companies’ model evals
Zach Stein-Perlman
May 26, 2025, 4:00 PM
58
points
0
comments
4
min read
LW
link
An alignment safety case sketch based on debate
Marie_DB
,
Jacob Pfau
,
Benjamin Hilton
and
Geoffrey Irving
May 8, 2025, 3:02 PM
57
points
19
comments
25
min read
LW
link
(arxiv.org)
Attend the 2025 Reproductive Frontiers Summit, June 10-12
TsviBT
and
Rachel Reid
May 9, 2025, 5:17 AM
57
points
0
comments
3
min read
LW
link
A widely shared AI productivity paper was retracted, is possibly fraudulent
titotal
May 19, 2025, 10:18 AM
56
points
4
comments
LW
link
GPT-4o Sycophancy Post Mortem
Zvi
May 5, 2025, 4:00 PM
55
points
1
comment
16
min read
LW
link
(thezvi.wordpress.com)
Orphaned Policies (Post 5 of 6 on AI Governance)
Mass_Driver
May 29, 2025, 9:42 PM
54
points
3
comments
16
min read
LW
link
Alignment Proposal: Adversarially Robust Augmentation and Distillation
Cole Wyeth
and
abramdemski
May 25, 2025, 12:58 PM
54
points
45
comments
13
min read
LW
link
The Need for Political Advertising (Post 2 of 6 on AI Governance)
Mass_Driver
May 21, 2025, 12:44 AM
54
points
2
comments
13
min read
LW
link
Socratic Persuasion: Giving Opinionated Yet Truth-Seeking Advice
Neel Nanda
May 26, 2025, 5:38 PM
53
points
12
comments
21
min read
LW
link
(www.neelnanda.io)
PSA: Before May 21 is a good time to sign up for cryonics
AlexMennen
May 4, 2025, 4:10 AM
53
points
0
comments
1
min read
LW
link
LessWrong Feed [new, now in beta]
Ruby
May 28, 2025, 7:01 PM
53
points
20
comments
8
min read
LW
link
Cheaters Gonna Cheat Cheat Cheat Cheat Cheat
Zvi
May 9, 2025, 2:30 PM
52
points
4
comments
22
min read
LW
link
(thezvi.wordpress.com)
Management is the Near Future
jefftk
May 17, 2025, 2:50 AM
52
points
10
comments
2
min read
LW
link
(www.jefftk.com)
Shift Resources to Advocacy Now (Post 4 of 6 on AI Governance)
Mass_Driver
May 28, 2025, 1:19 AM
51
points
18
comments
32
min read
LW
link
America Makes AI Chip Diffusion Deal with UAE and KSA
Zvi
May 19, 2025, 7:10 PM
51
points
7
comments
27
min read
LW
link
(thezvi.wordpress.com)
Reward button alignment
Steven Byrnes
May 22, 2025, 5:36 PM
50
points
15
comments
12
min read
LW
link
Can We Naturalize Moral Epistemology?
tylermjohn
21 May 2025 14:25 UTC
50
points
22
comments
6
min read
LW
link
Google Logo Ligature Bug
jefftk
18 May 2025 2:40 UTC
49
points
7
comments
1
min read
LW
link
(www.jefftk.com)
Google I/O Day
Zvi
21 May 2025 22:00 UTC
49
points
0
comments
20
min read
LW
link
(thezvi.wordpress.com)
Problems with instruction-following as an alignment target
Seth Herd
15 May 2025 15:41 UTC
48
points
14
comments
10
min read
LW
link
AI #116: If Anyone Builds It, Everyone Dies
Zvi
15 May 2025 15:10 UTC
47
points
5
comments
42
min read
LW
link
(thezvi.wordpress.com)
Re SMTM: negative feedback on negative feedback
Steven Byrnes
14 May 2025 19:50 UTC
46
points
1
comment
22
min read
LW
link
D&D.Sci: The Choosing Ones
abstractapplic
17 May 2025 15:26 UTC
46
points
16
comments
1
min read
LW
link
Overview: AI Safety Outreach Grassroots Orgs
Severin T. Seehrich
and
Benjamin Schmidt
4 May 2025 17:39 UTC
46
points
8
comments
2
min read
LW
link
Previous
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel