Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
Page
1
Moral Hazard in Democratic Voting
lsusr
Feb 12, 2025, 11:17 PM
20
points
8
comments
1
min read
LW
link
MATS Spring 2024 Extension Retrospective
HenningB
,
Matthew Wearden
,
Cameron Holmes
and
Ryan Kidd
Feb 12, 2025, 10:43 PM
26
points
1
comment
15
min read
LW
link
Hunting for AI Hackers: LLM Agent Honeypot
Reworr R
and
jacobhaimes
Feb 12, 2025, 8:29 PM
34
points
0
comments
5
min read
LW
link
(www.apartresearch.com)
Probability of AI-Caused Disaster
Alvin Ånestrand
Feb 12, 2025, 7:40 PM
2
points
2
comments
10
min read
LW
link
(forecastingaifutures.substack.com)
Two flaws in the Machiavelli Benchmark
TheManxLoiner
Feb 12, 2025, 7:34 PM
23
points
0
comments
3
min read
LW
link
Gradient Anatomy’s—Hallucination Robustness in Medical Q&A
DieSab
Feb 12, 2025, 7:16 PM
2
points
0
comments
10
min read
LW
link
Are current LLMs safe for psychotherapy?
PaperBike
Feb 12, 2025, 7:16 PM
5
points
4
comments
1
min read
LW
link
Comparing the effectiveness of top-down and bottom-up activation steering for bypassing refusal on harmful prompts
Ana Kapros
Feb 12, 2025, 7:12 PM
7
points
0
comments
5
min read
LW
link
The Paris AI Anti-Safety Summit
Zvi
Feb 12, 2025, 2:00 PM
129
points
21
comments
21
min read
LW
link
(thezvi.wordpress.com)
Inside the dark forests of the internet
Itay Dreyfus
Feb 12, 2025, 10:20 AM
10
points
0
comments
6
min read
LW
link
(productidentity.co)
Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs
Matrice Jacobine
Feb 12, 2025, 9:15 AM
53
points
49
comments
LW
link
(www.emergent-values.ai)
Why you maybe should lift weights, and How to.
samusasuke
Feb 12, 2025, 5:15 AM
32
points
29
comments
9
min read
LW
link
[Question]
how do the CEOs respond to our concerns?
KvmanThinking
Feb 11, 2025, 11:39 PM
−10
points
7
comments
1
min read
LW
link
Where Would Good Forecasts Most Help AI Governance Efforts?
Violet Hour
Feb 11, 2025, 6:15 PM
11
points
1
comment
6
min read
LW
link
AI Safety at the Frontier: Paper Highlights, January ’25
gasteigerjo
Feb 11, 2025, 4:14 PM
7
points
0
comments
8
min read
LW
link
(aisafetyfrontier.substack.com)
If Neuroscientists Succeed
Mordechai Rorvig
Feb 11, 2025, 3:33 PM
9
points
6
comments
18
min read
LW
link
The News is Never Neglected
lsusr
Feb 11, 2025, 2:59 PM
112
points
18
comments
1
min read
LW
link
Rethinking AI Safety Approach in the Era of Open-Source AI
Weibing Wang
Feb 11, 2025, 2:01 PM
4
points
0
comments
6
min read
LW
link
What About The Horses?
Maxwell Tabarrok
Feb 11, 2025, 1:59 PM
15
points
17
comments
7
min read
LW
link
(www.maximum-progress.com)
On Deliberative Alignment
Zvi
Feb 11, 2025, 1:00 PM
53
points
1
comment
6
min read
LW
link
(thezvi.wordpress.com)
Detecting AI Agent Failure Modes in Simulations
Michael Soareverix
Feb 11, 2025, 11:10 AM
17
points
0
comments
8
min read
LW
link
World Citizen Assembly about AI—Announcement
Camille Berger
Feb 11, 2025, 10:51 AM
26
points
1
comment
LW
link
Visual Reference for Frontier Large Language Models
kenakofer
Feb 11, 2025, 5:14 AM
14
points
0
comments
1
min read
LW
link
(kenan.schaefkofer.com)
Rational Effective Utopia & Narrow Way There: Multiversal AI Alignment, Place AI, New Ethicophysics… (Updated)
ank
Feb 11, 2025, 3:21 AM
13
points
8
comments
35
min read
LW
link
Arguing for the Truth? An Inference-Only Study into AI Debate
denisemester
Feb 11, 2025, 3:04 AM
7
points
0
comments
16
min read
LW
link
Why Did Elon Musk Just Offer to Buy Control of OpenAI for $100 Billion?
garrison
Feb 11, 2025, 12:20 AM
208
points
8
comments
LW
link
(garrisonlovely.substack.com)
Positive Directions
G Wood
Feb 11, 2025, 12:00 AM
0
points
0
comments
4
min read
LW
link
Logical Correlation
niplav
Feb 10, 2025, 11:29 PM
24
points
7
comments
10
min read
LW
link
Proof idea: SLT to AIT
Lucius Bushnaq
Feb 10, 2025, 11:14 PM
40
points
15
comments
6
min read
LW
link
LW/ACX social meetup
Stefan
Feb 10, 2025, 9:12 PM
2
points
0
comments
1
min read
LW
link
A Bearish Take on AI, as a Treat
rats
Feb 10, 2025, 7:22 PM
11
points
0
comments
4
min read
LW
link
(open.substack.com)
Beyond ELO: Rethinking Chess Skill as a Multidimensional Random Variable
Oliver Oswald
Feb 10, 2025, 7:19 PM
6
points
7
comments
2
min read
LW
link
Claude is More Anxious than GPT; Personality is an axis of interpretability in language models
future_detective
Feb 10, 2025, 7:19 PM
2
points
2
comments
8
min read
LW
link
(dhealy.substack.com)
Notes on Occam via Solomonoff vs. hierarchical Bayes
JesseClifton
Feb 10, 2025, 5:55 PM
29
points
7
comments
4
min read
LW
link
Sleeping Beauty: an Accuracy-based Approach
glauberdebona
Feb 10, 2025, 3:40 PM
7
points
2
comments
7
min read
LW
link
Political Idolatry
Arturo Macias
Feb 10, 2025, 3:26 PM
−8
points
7
comments
2
min read
LW
link
ML4Good Colombia—Applications Open to LatAm Participants
Alejandro Acelas
and
Manuela García
Feb 10, 2025, 3:03 PM
4
points
0
comments
1
min read
LW
link
Nonpartisan AI safety
Yair Halberstadt
Feb 10, 2025, 2:55 PM
30
points
4
comments
2
min read
LW
link
Opinion Article Scoring System
ciaran
Feb 10, 2025, 2:32 PM
1
point
0
comments
5
min read
LW
link
Levels of Friction
Zvi
Feb 10, 2025, 1:10 PM
149
points
8
comments
12
min read
LW
link
(thezvi.wordpress.com)
Baumol effect vs Jevons paradox
Hzn
Feb 10, 2025, 8:28 AM
0
points
0
comments
1
min read
LW
link
(hzn33.neocities.org)
[Question]
A Simulation of Automation economics?
qbolec
Feb 10, 2025, 8:11 AM
10
points
1
comment
1
min read
LW
link
[Question]
Should I Divest from AI?
OKlogic
Feb 10, 2025, 3:29 AM
6
points
4
comments
1
min read
LW
link
OpenAI lied about SFT vs. RLHF
sanxiyn
Feb 10, 2025, 3:24 AM
10
points
2
comments
1
min read
LW
link
(x.com)
“Self-Blackmail” and Alternatives
jessicata
9 Feb 2025 23:20 UTC
19
points
12
comments
7
min read
LW
link
(unstableontology.com)
Altman blog on post-AGI world
Julian Bradshaw
9 Feb 2025 21:52 UTC
29
points
10
comments
1
min read
LW
link
(blog.samaltman.com)
Forecasting newsletter #2/2025: Forecasting meetup network
NunoSempere
9 Feb 2025 18:07 UTC
13
points
0
comments
4
min read
LW
link
(forecasting.substack.com)
How identical twin sisters feel about nieces vs their own daughters
Dave Lindbergh
9 Feb 2025 17:36 UTC
4
points
19
comments
1
min read
LW
link
Two hemispheres—I do not think it means what you think it means
Viliam
9 Feb 2025 15:33 UTC
109
points
21
comments
14
min read
LW
link
The Structure of Professional Revolutions
SebastianG
9 Feb 2025 13:23 UTC
8
points
0
comments
4
min read
LW
link
Back to top
Next
N
W
F
A
C
D
E
F
G
H
I
Customize appearance
Current theme:
default
A
C
D
E
F
G
H
I
Less Wrong (text)
Less Wrong (link)
Invert colors
Reset to defaults
OK
Cancel
Hi, I’m Bobby the Basilisk! Click on the minimize button (
) to minimize the theme tweaker window, so that you can see what the page looks like with the current tweaked values. (But remember,
the changes won’t be saved until you click “OK”!
)
Theme tweaker help
Show Bobby the Basilisk
OK
Cancel