Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
LW/ACX Social Meetup
Stefan
Mar 12, 2025, 11:13 PM
2
points
0
comments
1
min read
LW
link
I grade every NBA basketball game I watch based on enjoyability
proshowersinger
Mar 12, 2025, 9:46 PM
24
points
2
comments
4
min read
LW
link
Kairos is hiring a Head of Operations/Founding Generalist
agucova
Mar 12, 2025, 8:58 PM
6
points
0
comments
LW
link
USAID Outlook: A Metaculus Forecasting Series
ChristianWilliams
Mar 12, 2025, 8:34 PM
9
points
0
comments
LW
link
(www.metaculus.com)
What is instrumental convergence?
Vishakha
and
Algon
Mar 12, 2025, 8:28 PM
2
points
0
comments
2
min read
LW
link
(aisafety.info)
Revising Stages-Oversight Reveals Greater Situational Awareness in LLMs
Sanyu Rajakumar
Mar 12, 2025, 5:56 PM
16
points
0
comments
13
min read
LW
link
Why Obedient AI May Be the Real Catastrophe
G~
Mar 12, 2025, 5:50 PM
5
points
2
comments
3
min read
LW
link
Your Communication Preferences Aren’t Law
Jonathan Moregård
Mar 12, 2025, 5:20 PM
25
points
4
comments
1
min read
LW
link
(honestliving.substack.com)
Reflections on Neuralese
Alice Blair
Mar 12, 2025, 4:29 PM
28
points
0
comments
5
min read
LW
link
Field tests of semi-rationality in Brazilian military training
P. João
Mar 12, 2025, 4:14 PM
31
points
0
comments
2
min read
LW
link
Many life-saving drugs fail for lack of funding. But there’s a solution: desperate rich people
Mvolz
Mar 12, 2025, 3:24 PM
17
points
0
comments
1
min read
LW
link
(www.theguardian.com)
The Most Forbidden Technique
Zvi
Mar 12, 2025, 1:20 PM
143
points
9
comments
17
min read
LW
link
(thezvi.wordpress.com)
You don’t actually need a physical multiverse to explain anthropic fine-tuning.
Fraser
Mar 12, 2025, 7:33 AM
7
points
8
comments
3
min read
LW
link
(frvser.com)
AI Can’t Write Good Fiction
JustisMills
Mar 12, 2025, 6:11 AM
38
points
24
comments
7
min read
LW
link
(justismills.substack.com)
Existing UDTs test the limits of Bayesianism (and consistency)
Cole Wyeth
Mar 12, 2025, 4:09 AM
28
points
21
comments
7
min read
LW
link
(Anti)Aging 101
George3d6
Mar 12, 2025, 3:59 AM
5
points
2
comments
3
min read
LW
link
(cerebralab.com)
The Grapes of Hardness
adamShimi
Mar 11, 2025, 9:01 PM
8
points
0
comments
5
min read
LW
link
(formethods.substack.com)
Don’t over-update on FrontierMath results
David Matolcsi
Mar 11, 2025, 8:44 PM
51
points
7
comments
9
min read
LW
link
Response to Scott Alexander on Imprisonment
Zvi
Mar 11, 2025, 8:40 PM
40
points
4
comments
9
min read
LW
link
(thezvi.wordpress.com)
Paths and waystations in AI safety
Joe Carlsmith
Mar 11, 2025, 6:52 PM
41
points
1
comment
11
min read
LW
link
(joecarlsmith.substack.com)
Meridian Cambridge Visiting Researcher Programme: Turn AI safety ideas into funded projects in one week!
Meridian Cambridge
Mar 11, 2025, 5:46 PM
13
points
0
comments
2
min read
LW
link
Elon Musk May Be Transitioning to Bipolar Type I
Cyborg25
Mar 11, 2025, 5:45 PM
83
points
22
comments
4
min read
LW
link
Scaling AI Regulation: Realistically, what Can (and Can’t) Be Regulated?
Katalina Hernandez
Mar 11, 2025, 4:51 PM
3
points
1
comment
3
min read
LW
link
How Language Models Understand Nullability
Anish Tondwalkar
and
Alex Sanchez-Stern
Mar 11, 2025, 3:57 PM
5
points
0
comments
2
min read
LW
link
(dmodel.ai)
Forethought: a new AI macrostrategy group
Max Dalton
,
Tom Davidson
,
wdmacaskill
and
AmritSidhu-Brar
Mar 11, 2025, 3:39 PM
18
points
0
comments
3
min read
LW
link
Preparing for the Intelligence Explosion
fin
and
wdmacaskill
Mar 11, 2025, 3:38 PM
78
points
17
comments
1
min read
LW
link
(www.forethought.org)
stop solving problems that have already been solved
dhruvmethi
Mar 11, 2025, 3:30 PM
10
points
3
comments
8
min read
LW
link
AI Control May Increase Existential Risk
Jan_Kulveit
Mar 11, 2025, 2:30 PM
98
points
13
comments
1
min read
LW
link
When is it Better to Train on the Alignment Proxy?
dil-leik-og
Mar 11, 2025, 1:35 PM
14
points
0
comments
9
min read
LW
link
A different take on the Musk v OpenAI preliminary injunction order
TFD
Mar 11, 2025, 12:46 PM
8
points
0
comments
20
min read
LW
link
(www.thefloatingdroid.com)
Do reasoning models use their scratchpad like we do? Evidence from distilling paraphrases
Fabien Roger
Mar 11, 2025, 11:52 AM
121
points
23
comments
11
min read
LW
link
(alignment.anthropic.com)
A Hogwarts Guide to Citizenship
WillPetillo
Mar 11, 2025, 5:50 AM
7
points
1
comment
3
min read
LW
link
Cognitive Reframing—How to Overcome Negative Thought Patterns and Behaviors
Declan Molony
Mar 11, 2025, 4:56 AM
11
points
0
comments
4
min read
LW
link
Trojan Sky
Richard_Ngo
Mar 11, 2025, 3:14 AM
245
points
39
comments
12
min read
LW
link
(www.narrativeark.xyz)
OpenAI: Detecting misbehavior in frontier reasoning models
Daniel Kokotajlo
Mar 11, 2025, 2:17 AM
183
points
26
comments
4
min read
LW
link
(openai.com)
HPMOR Anniversary Parties: Coordination, Resources, and Discussion
Screwtape
Mar 11, 2025, 1:30 AM
52
points
6
comments
7
min read
LW
link
Positional kernels of attention heads
Alex Gibson
Mar 10, 2025, 11:17 PM
9
points
0
comments
12
min read
LW
link
Progress links and short notes, 2025-03-10
jasoncrawford
Mar 10, 2025, 8:27 PM
8
points
0
comments
4
min read
LW
link
(newsletter.rootsofprogress.org)
The Manus Marketing Madness
Zvi
Mar 10, 2025, 8:10 PM
54
points
0
comments
24
min read
LW
link
(thezvi.wordpress.com)
You can just play
aswath krishnan
Mar 10, 2025, 8:00 PM
−5
points
0
comments
2
min read
LW
link
How to Use Prompt Engineering to Rewire Your Brain
aswath krishnan
Mar 10, 2025, 8:00 PM
1
point
0
comments
5
min read
LW
link
(www.aswathkrishnan.com)
When Independent Optimization Is Worse Than Randomness
Chaotic rationalist
Mar 10, 2025, 7:46 PM
−4
points
0
comments
2
min read
LW
link
Stress exists only where the Mind makes it
Noahh
Mar 10, 2025, 7:44 PM
5
points
2
comments
4
min read
LW
link
Counterargument to Godel’s Modal Ontological Argument
Wynn
Mar 10, 2025, 7:38 PM
−1
points
0
comments
4
min read
LW
link
[Question]
How much do frontier LLMs code and browse while in training?
Joe Rogero
Mar 10, 2025, 7:34 PM
7
points
0
comments
1
min read
LW
link
Observations on self-supervised Learning for vision
Dinkar Juyal
Mar 10, 2025, 7:31 PM
3
points
0
comments
5
min read
LW
link
Introducing 11 New AI Safety Organizations—Catalyze’s Winter 24/25 London Incubation Program Cohort
Alexandra Bos
Mar 10, 2025, 7:26 PM
70
points
0
comments
LW
link
The Jackpot Jinx (or why “Superintelligence Strategy” is wrong)
E.G. Blee-Goldman
Mar 10, 2025, 7:18 PM
13
points
0
comments
5
min read
LW
link
Effective AI Outreach | A Data Driven Approach
NoahCWilson
Mar 10, 2025, 7:18 PM
1
point
0
comments
15
min read
LW
link
Emergent AI Society. Tasks, Scarcity, Talks
Andrey Seryakov
Mar 10, 2025, 7:18 PM
1
point
0
comments
5
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel