Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
All
1
2
3
4
5
6
7
8
Page
1
Chaos Alone is No Bar to Superintelligence
Algon
6 Oct 2025 22:45 UTC
11
points
0
comments
2
min read
LW
link
(aisafety.info)
We won’t get AIs smart enough to solve alignment but too dumb to rebel
Joe Rogero
6 Oct 2025 21:49 UTC
28
points
16
comments
5
min read
LW
link
Notes on the need to lose
Algon
6 Oct 2025 21:27 UTC
2
points
6
comments
2
min read
LW
link
Excerpts from my neuroscience to-do list
Steven Byrnes
6 Oct 2025 21:05 UTC
26
points
1
comment
4
min read
LW
link
Experience Report—ML4Good Bootcamp Singapore, Sep′25
NurAlam
6 Oct 2025 18:49 UTC
2
points
0
comments
4
min read
LW
link
Which differences between sandbagging evaluations and sandbagging safety research are important for control?
lennie
6 Oct 2025 18:20 UTC
1
point
0
comments
11
min read
LW
link
Gradual Disempowerment Monthly Roundup
Raymond Douglas
6 Oct 2025 15:36 UTC
93
points
7
comments
6
min read
LW
link
Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity
David Africa
6 Oct 2025 15:26 UTC
16
points
3
comments
7
min read
LW
link
The Origami Men
Tomás B.
6 Oct 2025 15:25 UTC
138
points
9
comments
16
min read
LW
link
Medical Roundup #5
Zvi
6 Oct 2025 15:10 UTC
26
points
2
comments
26
min read
LW
link
(thezvi.wordpress.com)
Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions.
lennie
6 Oct 2025 14:00 UTC
1
point
0
comments
8
min read
LW
link
Why I think ECL shouldn’t make you update your cause prio
Jim Buhler
6 Oct 2025 13:01 UTC
2
points
0
comments
11
min read
LW
link
[Question]
Did Tyler Robinson carry his rifle as claimed by the government?
ChristianKl
6 Oct 2025 12:46 UTC
4
points
9
comments
1
min read
LW
link
AI Science Companies: Evidence AGI Is Near
Josh Snider
6 Oct 2025 10:13 UTC
5
points
3
comments
1
min read
LW
link
(www.joshuasnider.com)
LLMs one-box when in a “hostile telepath” version of Newcomb’s Paradox, except for the one that beat the predictor
Kaj_Sotala
6 Oct 2025 8:44 UTC
47
points
6
comments
17
min read
LW
link
Alignment Faking Demo for Congressional Staffers
Alice Blair
6 Oct 2025 1:44 UTC
19
points
0
comments
3
min read
LW
link
Do Things for as Many Reasons as Possible
Philipreal
6 Oct 2025 0:28 UTC
35
points
1
comment
2
min read
LW
link
One Does Not Simply Walk Away from Omelas
Taylor G. Lunt
6 Oct 2025 0:04 UTC
4
points
5
comments
7
min read
LW
link
The quotation mark
Maxwell Peterson
5 Oct 2025 23:23 UTC
19
points
8
comments
13
min read
LW
link
The Sadism Spectrum and How to Access It
Dawn Drescher
5 Oct 2025 23:09 UTC
13
points
2
comments
20
min read
LW
link
(impartial-priorities.org)
Maybe social media algorithms don’t suck
Algon
5 Oct 2025 18:47 UTC
64
points
18
comments
3
min read
LW
link
Base64Bench: How good are LLMs at base64, and why care about it?
richbc
5 Oct 2025 18:07 UTC
31
points
6
comments
11
min read
LW
link
[Question]
What can Canadians do to help end the AI arms race?
Tom938
5 Oct 2025 18:03 UTC
8
points
7
comments
2
min read
LW
link
17 years old, self-taught state control—looking for people who actually get this
Cornelius Caspian
5 Oct 2025 18:02 UTC
−3
points
3
comments
1
min read
LW
link
Behavior Best-of-N achieves Near Human Performance on Computer Tasks
Baybar
5 Oct 2025 16:53 UTC
6
points
0
comments
3
min read
LW
link
Accelerating AI Safety Progress via Technical Methods- Calling Researchers, Founders, and Funders
Martin Leitgab
5 Oct 2025 16:40 UTC
1
point
0
comments
1
min read
LW
link
Mini-Symposium on Accelerating AI Safety Progress via Technical Methods—Hybrid In-Person and Virtual
Martin Leitgab
5 Oct 2025 16:05 UTC
1
point
0
comments
1
min read
LW
link
[Question]
How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?
CanYouFeelTheBenefits
5 Oct 2025 14:38 UTC
14
points
1
comment
1
min read
LW
link
LLMs are badly misaligned
Joe Rogero
5 Oct 2025 14:00 UTC
27
points
25
comments
3
min read
LW
link
The Counterfactual Quiet AGI Timeline
Davidmanheim
5 Oct 2025 9:09 UTC
64
points
5
comments
9
min read
LW
link
AISafety.com Reading Group session 328
Søren Elverlin
5 Oct 2025 7:51 UTC
5
points
0
comments
1
min read
LW
link
How the NanoGPT Speedrun WR dropped by 20% in 3 months
larry-dial
5 Oct 2025 1:05 UTC
26
points
9
comments
9
min read
LW
link
a quick thought about AI alignment
foodforthought
5 Oct 2025 0:51 UTC
10
points
4
comments
1
min read
LW
link
Making Your Pain Worse can Get You What You Want
Logan Riggs
5 Oct 2025 0:19 UTC
76
points
4
comments
3
min read
LW
link
Markets in Democracy: What happens when you can sell your vote?
Mike Evron
4 Oct 2025 23:59 UTC
4
points
20
comments
3
min read
LW
link
$250 bounties for the best short stories set in our near future world & Brooklyn event to select them
Ramon Gonzalez
4 Oct 2025 22:49 UTC
10
points
0
comments
2
min read
LW
link
What I’ve Learnt About How to Sleep
Algon
4 Oct 2025 20:52 UTC
25
points
7
comments
2
min read
LW
link
The ‘Magic’ of LLMs: The Function of Language
Joseph Banks
4 Oct 2025 17:45 UTC
13
points
0
comments
7
min read
LW
link
Open Philanthropy’s Biosecurity and Pandemic Preparedness Team Is Hiring and Seeking New Grantees
miriam.hinthorn
4 Oct 2025 17:42 UTC
3
points
0
comments
1
min read
LW
link
Consider Small Walks at Work
Morpheus
4 Oct 2025 11:53 UTC
10
points
0
comments
3
min read
LW
link
Where does Sonnet 4.5′s desire to “not get too comfortable” come from?
Kaj_Sotala
4 Oct 2025 10:19 UTC
91
points
16
comments
64
min read
LW
link
A Workflow for System Prompted Model Organisms
michaelwaves
3 Oct 2025 21:39 UTC
1
point
0
comments
3
min read
LW
link
Goodness is harder to achieve than competence
Joe Rogero
3 Oct 2025 21:32 UTC
22
points
0
comments
3
min read
LW
link
Memory Decoding Journal Club: Connectomic traces of Hebbian plasticity in the entorhinal-hippocampal system
Devin Ward
3 Oct 2025 21:24 UTC
1
point
0
comments
1
min read
LW
link
Good is a smaller target than smart
Joe Rogero
3 Oct 2025 21:04 UTC
21
points
0
comments
2
min read
LW
link
Making Sense of Consciousness Part 6: Perceptions of Disembodiment
sarahconstantin
3 Oct 2025 20:40 UTC
27
points
0
comments
8
min read
LW
link
(sarahconstantin.substack.com)
Recent AI Experiences
abramdemski
3 Oct 2025 19:32 UTC
54
points
1
comment
6
min read
LW
link
Our Experience Running Independent Evaluations on LLMs: What Have We Learned?
MAlvarado
3 Oct 2025 18:26 UTC
7
points
1
comment
5
min read
LW
link
Do One New Thing A Day To Solve Your Problems
Algon
3 Oct 2025 17:08 UTC
102
points
5
comments
2
min read
LW
link
ENAIS is looking for an Executive Director (apply by 20th October)
gergogaspar
and
ENAIS
3 Oct 2025 15:29 UTC
11
points
0
comments
2
min read
LW
link
Back to top
Next