Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
1
C’mon guys, Deliberate Practice is Real
Raemon
Feb 5, 2025, 10:33 PM
99
points
25
comments
9
min read
LW
link
The Risk of Gradual Disempowerment from AI
Zvi
Feb 5, 2025, 10:10 PM
87
points
20
comments
20
min read
LW
link
(thezvi.wordpress.com)
Wired on: “DOGE personnel with admin access to Federal Payment System”
Raemon
Feb 5, 2025, 9:32 PM
88
points
45
comments
2
min read
LW
link
(web.archive.org)
On AI Scaling
harsimony
Feb 5, 2025, 8:24 PM
6
points
3
comments
8
min read
LW
link
(splittinginfinity.substack.com)
The State of Metaculus
ChristianWilliams
Feb 5, 2025, 7:17 PM
21
points
0
comments
LW
link
(www.metaculus.com)
Post-hoc reasoning in chain of thought
Kyle Cox
Feb 5, 2025, 6:58 PM
17
points
0
comments
11
min read
LW
link
DeepSeek-R1 for Beginners
Anton Razzhigaev
Feb 5, 2025, 6:58 PM
12
points
0
comments
8
min read
LW
link
Making the case for average-case AI Control
Nathaniel Mitrani
Feb 5, 2025, 6:56 PM
4
points
0
comments
5
min read
LW
link
[Question]
Alignment Paradox and a Request for Harsh Criticism
Bridgett Kay
Feb 5, 2025, 6:17 PM
6
points
7
comments
1
min read
LW
link
Introducing International AI Governance Alliance (IAIGA)
jamesnorris
Feb 5, 2025, 4:02 PM
1
point
0
comments
1
min read
LW
link
Introducing Collective Action for Existential Safety: 80+ actions individuals, organizations, and nations can take to improve our existential safety
jamesnorris
Feb 5, 2025, 4:02 PM
−9
points
2
comments
1
min read
LW
link
Language Models Use Trigonometry to Do Addition
Subhash Kantamneni
Feb 5, 2025, 1:50 PM
76
points
1
comment
10
min read
LW
link
Deploying the Observer will save humanity from existential threats
Aram Panasenco
Feb 5, 2025, 10:39 AM
−11
points
8
comments
1
min read
LW
link
The Domain of Orthogonality
mgfcatherall
Feb 5, 2025, 8:14 AM
1
point
0
comments
7
min read
LW
link
Reviewing LessWrong: Screwtape’s Basic Answer
Screwtape
Feb 5, 2025, 4:30 AM
96
points
18
comments
6
min read
LW
link
[Question]
Why isn’t AI containment the primary AI safety strategy?
OKlogic
Feb 5, 2025, 3:54 AM
1
point
3
comments
3
min read
LW
link
Nick Land: Orthogonality
lumpenspace
Feb 4, 2025, 9:07 PM
12
points
37
comments
8
min read
LW
link
What working on AI safety taught me about B2B SaaS sales
purple fire
Feb 4, 2025, 8:50 PM
7
points
12
comments
5
min read
LW
link
Subjective Naturalism in Decision Theory: Savage vs. Jeffrey–Bolker
Daniel Herrmann
,
Aydin Mohseni
and
ben_levinstein
Feb 4, 2025, 8:34 PM
45
points
22
comments
5
min read
LW
link
Anti-Slop Interventions?
abramdemski
Feb 4, 2025, 7:50 PM
76
points
33
comments
6
min read
LW
link
Can Persuasion Break AI Safety? Exploring the Interplay Between Fine-Tuning, Attacks, and Guardrails
Devina Jain
Feb 4, 2025, 7:10 PM
3
points
0
comments
10
min read
LW
link
[Question]
Journalism student looking for sources
pinkerton
Feb 4, 2025, 6:58 PM
11
points
3
comments
1
min read
LW
link
We’re in Deep Research
Zvi
Feb 4, 2025, 5:20 PM
45
points
2
comments
20
min read
LW
link
(thezvi.wordpress.com)
The Capitalist Agent
henophilia
Feb 4, 2025, 3:32 PM
1
point
10
comments
3
min read
LW
link
(blog.hermesloom.org)
Forecasting AGI: Insights from Prediction Markets and Metaculus
Alvin Ånestrand
Feb 4, 2025, 1:03 PM
13
points
0
comments
4
min read
LW
link
(forecastingaifutures.substack.com)
Ruling Out Lookup Tables
Alfred Harwood
Feb 4, 2025, 10:39 AM
22
points
11
comments
7
min read
LW
link
Half-baked idea: a straightforward method for learning environmental goals?
Q Home
Feb 4, 2025, 6:56 AM
16
points
7
comments
5
min read
LW
link
Information Versus Action
Screwtape
Feb 4, 2025, 5:13 AM
27
points
0
comments
6
min read
LW
link
Utilitarian AI Alignment: Building a Moral Assistant with the Constitutional AI Method
Clément L
Feb 4, 2025, 4:15 AM
6
points
1
comment
13
min read
LW
link
Tear Down the Burren
jefftk
Feb 4, 2025, 3:40 AM
45
points
2
comments
2
min read
LW
link
(www.jefftk.com)
Constitutional Classifiers: Defending against universal jailbreaks (Anthropic Blog)
Archimedes
Feb 4, 2025, 2:55 AM
16
points
1
comment
1
min read
LW
link
(www.anthropic.com)
Can someone, anyone, make superintelligence a more concrete concept?
Ori Nagel
Feb 4, 2025, 2:18 AM
2
points
8
comments
5
min read
LW
link
What are the “no free lunch” theorems?
Vishakha
and
Algon
Feb 4, 2025, 2:02 AM
19
points
4
comments
1
min read
LW
link
(aisafety.info)
eliminating bias through language?
KvmanThinking
Feb 4, 2025, 1:52 AM
1
point
12
comments
1
min read
LW
link
New Foresight Longevity Bio & Molecular Nano Grants Program
Allison Duettmann
Feb 4, 2025, 12:28 AM
11
points
0
comments
1
min read
LW
link
Meta: Frontier AI Framework
Zach Stein-Perlman
Feb 3, 2025, 10:00 PM
33
points
2
comments
1
min read
LW
link
(ai.meta.com)
$300 Fermi Model Competition
ozziegooen
Feb 3, 2025, 7:47 PM
16
points
18
comments
LW
link
Visualizing Interpretability
Darold Davis
Feb 3, 2025, 7:36 PM
2
points
0
comments
4
min read
LW
link
Alignment Can Reduce Performance on Simple Ethical Questions
Daan Henselmans
Feb 3, 2025, 7:35 PM
16
points
7
comments
6
min read
LW
link
The Overlap Paradigm: Rethinking Data’s Role in Weak-to-Strong Generalization (W2SG)
Serhii Zamrii
Feb 3, 2025, 7:31 PM
2
points
0
comments
11
min read
LW
link
Sleeper agents appear resilient to activation steering
Lucy Wingard
Feb 3, 2025, 7:31 PM
6
points
0
comments
7
min read
LW
link
Part 1: Enhancing Inner Alignment in CLIP Vision Transformers: Mitigating Reification Bias with SAEs and Grad ECLIP
Gilber A. Corrales
Feb 3, 2025, 7:30 PM
1
point
0
comments
13
min read
LW
link
Superintelligence Alignment Proposal
Davey Morse
Feb 3, 2025, 6:47 PM
5
points
3
comments
9
min read
LW
link
Gettier Cases [repost]
Antigone
Feb 3, 2025, 6:12 PM
−4
points
5
comments
2
min read
LW
link
The Self-Reference Trap in Mathematics
Alister Munday
Feb 3, 2025, 4:12 PM
−41
points
23
comments
2
min read
LW
link
Stopping unaligned LLMs is easy!
Yair Halberstadt
Feb 3, 2025, 3:38 PM
−3
points
11
comments
2
min read
LW
link
The Outer Levels
Jerdle
Feb 3, 2025, 2:30 PM
2
points
3
comments
6
min read
LW
link
o3-mini Early Days
Zvi
Feb 3, 2025, 2:20 PM
45
points
0
comments
15
min read
LW
link
(thezvi.wordpress.com)
OpenAI releases deep research agent
Seth Herd
3 Feb 2025 12:48 UTC
78
points
21
comments
3
min read
LW
link
(openai.com)
Neuron Activations to CLIP Embeddings: Geometry of Linear Combinations in Latent Space
Roman Malov
3 Feb 2025 10:30 UTC
4
points
0
comments
2
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel