Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
1
Notes on the Presidential Election of 1836
Arjun Panickssery
Feb 13, 2025, 11:40 PM
23
points
0
comments
7
min read
LW
link
(arjunpanickssery.substack.com)
Static Place AI Makes Agentic AI Redundant: Multiversal AI Alignment & Rational Utopia
ank
Feb 13, 2025, 10:35 PM
1
point
2
comments
11
min read
LW
link
I’m making a ttrpg about life in an intentional community during the last year before the Singularity
bgaesop
Feb 13, 2025, 9:54 PM
11
points
2
comments
2
min read
LW
link
SWE Automation Is Coming: Consider Selling Your Crypto
A_donor
Feb 13, 2025, 8:17 PM
12
points
8
comments
1
min read
LW
link
≤10-year Timelines Remain Unlikely Despite DeepSeek and o3
Rafael Harth
Feb 13, 2025, 7:21 PM
52
points
67
comments
15
min read
LW
link
System 2 Alignment
Seth Herd
Feb 13, 2025, 7:17 PM
35
points
0
comments
22
min read
LW
link
Murder plots are infohazards
Chris Monteiro
Feb 13, 2025, 7:15 PM
303
points
44
comments
2
min read
LW
link
Sparse Autoencoder Feature Ablation for Unlearning
aludert
Feb 13, 2025, 7:13 PM
3
points
0
comments
11
min read
LW
link
What is it to solve the alignment problem?
Joe Carlsmith
Feb 13, 2025, 6:42 PM
31
points
6
comments
19
min read
LW
link
(joecarlsmith.substack.com)
Self-dialogue: Do behaviorist rewards make scheming AGIs?
Steven Byrnes
Feb 13, 2025, 6:39 PM
43
points
0
comments
46
min read
LW
link
How do we solve the alignment problem?
Joe Carlsmith
Feb 13, 2025, 6:27 PM
63
points
9
comments
6
min read
LW
link
(joecarlsmith.substack.com)
Ambiguous out-of-distribution generalization on an algorithmic task
Wilson Wu
and
Louis Jaburi
Feb 13, 2025, 6:24 PM
83
points
6
comments
11
min read
LW
link
Teaching AI to reason: this year’s most important story
Benjamin_Todd
Feb 13, 2025, 5:40 PM
10
points
0
comments
10
min read
LW
link
(benjamintodd.substack.com)
AI #103: Show Me the Money
Zvi
Feb 13, 2025, 3:20 PM
30
points
9
comments
58
min read
LW
link
(thezvi.wordpress.com)
OpenAI’s NSFW policy: user safety, harm reduction, and AI consent
8e9
Feb 13, 2025, 1:59 PM
4
points
3
comments
2
min read
LW
link
Studies of Human Error Rate
tin482
Feb 13, 2025, 1:43 PM
15
points
3
comments
1
min read
LW
link
the dumbest theory of everything
lostinwilliamsburg
Feb 13, 2025, 7:57 AM
−1
points
0
comments
7
min read
LW
link
Skepticism towards claims about the views of powerful institutions
tlevin
Feb 13, 2025, 7:40 AM
46
points
2
comments
4
min read
LW
link
Virtue signaling, and the “humans-are-wonderful” bias, as a trust exercise
lc
Feb 13, 2025, 6:59 AM
44
points
16
comments
4
min read
LW
link
My model of what is going on with LLMs
Cole Wyeth
Feb 13, 2025, 3:43 AM
104
points
49
comments
7
min read
LW
link
Not all capabilities will be created equal: focus on strategically superhuman agents
benwr
Feb 13, 2025, 1:24 AM
62
points
8
comments
3
min read
LW
link
LLMs can teach themselves to better predict the future
Ben Turtel
Feb 13, 2025, 1:01 AM
0
points
1
comment
1
min read
LW
link
(arxiv.org)
Dovetail’s agent foundations fellowship talks & discussion
Alex_Altair
Feb 13, 2025, 12:49 AM
10
points
0
comments
1
min read
LW
link
Extended analogy between humans, corporations, and AIs.
Daniel Kokotajlo
Feb 13, 2025, 12:03 AM
36
points
2
comments
6
min read
LW
link
Moral Hazard in Democratic Voting
lsusr
Feb 12, 2025, 11:17 PM
20
points
8
comments
1
min read
LW
link
MATS Spring 2024 Extension Retrospective
HenningB
,
Matthew Wearden
,
Cameron Holmes
and
Ryan Kidd
Feb 12, 2025, 10:43 PM
26
points
1
comment
15
min read
LW
link
Hunting for AI Hackers: LLM Agent Honeypot
Reworr R
and
jacobhaimes
Feb 12, 2025, 8:29 PM
34
points
0
comments
5
min read
LW
link
(www.apartresearch.com)
Probability of AI-Caused Disaster
Alvin Ånestrand
Feb 12, 2025, 7:40 PM
2
points
2
comments
10
min read
LW
link
(forecastingaifutures.substack.com)
Two flaws in the Machiavelli Benchmark
TheManxLoiner
Feb 12, 2025, 7:34 PM
23
points
0
comments
3
min read
LW
link
Gradient Anatomy’s—Hallucination Robustness in Medical Q&A
DieSab
Feb 12, 2025, 7:16 PM
2
points
0
comments
10
min read
LW
link
Are current LLMs safe for psychotherapy?
PaperBike
Feb 12, 2025, 7:16 PM
5
points
4
comments
1
min read
LW
link
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros
Feb 12, 2025, 7:12 PM
7
points
0
comments
5
min read
LW
link
The Paris AI Anti-Safety Summit
Zvi
Feb 12, 2025, 2:00 PM
129
points
21
comments
21
min read
LW
link
(thezvi.wordpress.com)
Inside the dark forests of the internet
Itay Dreyfus
Feb 12, 2025, 10:20 AM
10
points
0
comments
6
min read
LW
link
(productidentity.co)
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Matrice Jacobine
Feb 12, 2025, 9:15 AM
53
points
49
comments
LW
link
(www.emergent-values.ai)
Why you maybe should lift weights, and How to.
samusasuke
Feb 12, 2025, 5:15 AM
32
points
29
comments
9
min read
LW
link
[Question]
how do the CEOs respond to our concerns?
KvmanThinking
Feb 11, 2025, 11:39 PM
−10
points
7
comments
1
min read
LW
link
Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour
Feb 11, 2025, 6:15 PM
11
points
1
comment
6
min read
LW
link
AI Safety at the Frontier: Paper Highlights, January ’25
gasteigerjo
Feb 11, 2025, 4:14 PM
7
points
0
comments
8
min read
LW
link
(aisafetyfrontier.substack.com)
If Neuroscientists Succeed
Mordechai Rorvig
Feb 11, 2025, 3:33 PM
9
points
6
comments
18
min read
LW
link
The News is Never Neglected
lsusr
Feb 11, 2025, 2:59 PM
112
points
18
comments
1
min read
LW
link
Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang
Feb 11, 2025, 2:01 PM
4
points
0
comments
6
min read
LW
link
What About The Horses?
Maxwell Tabarrok
Feb 11, 2025, 1:59 PM
15
points
17
comments
7
min read
LW
link
(www.maximum-progress.com)
On Deliberative Alignment
Zvi
Feb 11, 2025, 1:00 PM
53
points
1
comment
6
min read
LW
link
(thezvi.wordpress.com)
Detecting AI Agent Failure Modes in Simulations
Michael Soareverix
Feb 11, 2025, 11:10 AM
17
points
0
comments
8
min read
LW
link
World Citizen Assembly about AI—Announcement
Camille Berger
Feb 11, 2025, 10:51 AM
26
points
1
comment
LW
link
Visual Reference for Frontier Large Language Models
kenakofer
Feb 11, 2025, 5:14 AM
14
points
0
comments
1
min read
LW
link
(kenan.schaefkofer.com)
Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)
ank
Feb 11, 2025, 3:21 AM
13
points
8
comments
35
min read
LW
link
Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester
Feb 11, 2025, 3:04 AM
7
points
0
comments
16
min read
LW
link
Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison
Feb 11, 2025, 12:20 AM
208
points
8
comments
LW
link
(garrisonlovely.substack.com)
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel