Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
Archive
Sequences
About
Search
Log In
All
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
2025
All
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
All
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Page
1
Chaos Alone is No Bar to Superintelligence
Algon
6 Oct 2025 22:45 UTC
12
points
0
comments
2
min read
LW
link
(aisafety.info)
We won’t get AIs smart enough to solve alignment but too dumb to rebel
Joe Rogero
6 Oct 2025 21:49 UTC
28
points
16
comments
5
min read
LW
link
Notes on the need to lose
Algon
6 Oct 2025 21:27 UTC
2
points
11
comments
2
min read
LW
link
Excerpts from my neuroscience to-do list
Steven Byrnes
6 Oct 2025 21:05 UTC
28
points
2
comments
4
min read
LW
link
Experience Report—ML4Good Bootcamp Singapore, Sep′25
NurAlam
6 Oct 2025 18:49 UTC
5
points
0
comments
4
min read
LW
link
Which differences between sandbagging evaluations and sandbagging safety research are important for control?
lennie
6 Oct 2025 18:20 UTC
6
points
0
comments
11
min read
LW
link
Gradual Disempowerment Monthly Roundup
Raymond Douglas
6 Oct 2025 15:36 UTC
119
points
9
comments
6
min read
LW
link
Subliminal Learning, the Lottery-Ticket Hypothesis, and Mode Connectivity
David Africa
6 Oct 2025 15:26 UTC
23
points
6
comments
7
min read
LW
link
The Origami Men
Tomás B.
6 Oct 2025 15:25 UTC
189
points
14
comments
16
min read
LW
link
Medical Roundup #5
Zvi
6 Oct 2025 15:10 UTC
39
points
3
comments
26
min read
LW
link
(thezvi.wordpress.com)
Sandbagging: distinguishing detection of underperformance from incrimination, and the implications for downstream interventions.
lennie
6 Oct 2025 14:00 UTC
8
points
0
comments
8
min read
LW
link
Why I think ECL shouldn’t make you update your cause prio
Jim Buhler
6 Oct 2025 13:01 UTC
1
point
0
comments
11
min read
LW
link
[Question]
Did Tyler Robinson carry his rifle as claimed by the government?
ChristianKl
6 Oct 2025 12:46 UTC
2
points
15
comments
1
min read
LW
link
AI Science Companies: Evidence AGI Is Near
Josh Snider
6 Oct 2025 10:13 UTC
6
points
3
comments
1
min read
LW
link
(www.joshuasnider.com)
LLMs one-box when in a “hostile telepath” version of Newcomb’s Paradox, except for the one that beat the predictor
Kaj_Sotala
6 Oct 2025 8:44 UTC
52
points
6
comments
17
min read
LW
link
Alignment Faking Demo for Congressional Staffers
Alice Blair
6 Oct 2025 1:44 UTC
21
points
2
comments
3
min read
LW
link
Do Things for as Many Reasons as Possible
Philipreal
6 Oct 2025 0:28 UTC
39
points
2
comments
2
min read
LW
link
One Does Not Simply Walk Away from Omelas
Taylor G. Lunt
6 Oct 2025 0:04 UTC
1
point
5
comments
7
min read
LW
link
The quotation mark
Maxwell Peterson
5 Oct 2025 23:23 UTC
21
points
8
comments
13
min read
LW
link
The Sadism Spectrum and How to Access It
Dawn Drescher
5 Oct 2025 23:09 UTC
14
points
2
comments
20
min read
LW
link
(impartial-priorities.org)
Maybe social media algorithms don’t suck
Algon
5 Oct 2025 18:47 UTC
70
points
25
comments
3
min read
LW
link
Base64Bench: How good are LLMs at base64, and why care about it?
richbc
5 Oct 2025 18:07 UTC
39
points
10
comments
11
min read
LW
link
[Question]
What can Canadians do to help end the AI arms race?
Tom938
5 Oct 2025 18:03 UTC
8
points
7
comments
2
min read
LW
link
17 years old, self-taught state control—looking for people who actually get this
Cornelius Caspian
5 Oct 2025 18:02 UTC
−3
points
3
comments
1
min read
LW
link
Behavior Best-of-N achieves Near Human Performance on Computer Tasks
Baybar
5 Oct 2025 16:53 UTC
6
points
0
comments
3
min read
LW
link
Accelerating AI Safety Progress via Technical Methods- Calling Researchers, Founders, and Funders
Martin Leitgab
5 Oct 2025 16:40 UTC
1
point
0
comments
1
min read
LW
link
Mini-Symposium on Accelerating AI Safety Progress via Technical Methods—Hybrid In-Person and Virtual
Martin Leitgab
5 Oct 2025 16:05 UTC
1
point
0
comments
1
min read
LW
link
[Question]
How likely are “s-risks” (large-scale suffering outcomes) from unaligned AI compared to extinction risks?
CanYouFeelTheBenefits
5 Oct 2025 14:38 UTC
15
points
2
comments
1
min read
LW
link
LLMs are badly misaligned
Joe Rogero
5 Oct 2025 14:00 UTC
27
points
25
comments
3
min read
LW
link
The Counterfactual Quiet AGI Timeline
Davidmanheim
5 Oct 2025 9:09 UTC
71
points
5
comments
9
min read
LW
link
AISafety.com Reading Group session 328
Søren Elverlin
5 Oct 2025 7:51 UTC
5
points
0
comments
1
min read
LW
link
How the NanoGPT Speedrun WR dropped by 20% in 3 months
larry-dial
5 Oct 2025 1:05 UTC
54
points
9
comments
9
min read
LW
link
a quick thought about AI alignment
foodforthought
5 Oct 2025 0:51 UTC
10
points
4
comments
1
min read
LW
link
Making Your Pain Worse can Get You What You Want
Logan Riggs
5 Oct 2025 0:19 UTC
87
points
5
comments
3
min read
LW
link
Markets in Democracy: What happens when you can sell your vote?
Mike Evron
4 Oct 2025 23:59 UTC
4
points
21
comments
3
min read
LW
link
$250 bounties for the best short stories set in our near future world & Brooklyn event to select them
Ramon Gonzalez
4 Oct 2025 22:49 UTC
10
points
0
comments
2
min read
LW
link
What I’ve Learnt About How to Sleep
Algon
4 Oct 2025 20:52 UTC
29
points
8
comments
2
min read
LW
link
The ‘Magic’ of LLMs: The Function of Language
Joseph Banks
4 Oct 2025 17:45 UTC
13
points
0
comments
7
min read
LW
link
Open Philanthropy’s Biosecurity and Pandemic Preparedness Team Is Hiring and Seeking New Grantees
miriam.hinthorn
4 Oct 2025 17:42 UTC
3
points
0
comments
1
min read
LW
link
Consider Small Walks at Work
Morpheus
4 Oct 2025 11:53 UTC
10
points
0
comments
3
min read
LW
link
Where does Sonnet 4.5′s desire to “not get too comfortable” come from?
Kaj_Sotala
4 Oct 2025 10:19 UTC
103
points
23
comments
64
min read
LW
link
Munk Debate on AI: a few observations and opinions
[deactivated]
4 Oct 2025 0:24 UTC
2
points
0
comments
1
min read
LW
link
A Workflow for System Prompted Model Organisms
michaelwaves
3 Oct 2025 21:39 UTC
1
point
0
comments
3
min read
LW
link
Goodness is harder to achieve than competence
Joe Rogero
3 Oct 2025 21:32 UTC
22
points
0
comments
3
min read
LW
link
Memory Decoding Journal Club: Connectomic traces of Hebbian plasticity in the entorhinal-hippocampal system
Devin Ward
3 Oct 2025 21:24 UTC
1
point
0
comments
1
min read
LW
link
Good is a smaller target than smart
Joe Rogero
3 Oct 2025 21:04 UTC
21
points
0
comments
2
min read
LW
link
Making Sense of Consciousness Part 6: Perceptions of Disembodiment
sarahconstantin
3 Oct 2025 20:40 UTC
27
points
0
comments
8
min read
LW
link
(sarahconstantin.substack.com)
Recent AI Experiences
abramdemski
3 Oct 2025 19:32 UTC
58
points
5
comments
6
min read
LW
link
Our Experience Running Independent Evaluations on LLMs: What Have We Learned?
MAlvarado
3 Oct 2025 18:26 UTC
7
points
1
comment
5
min read
LW
link
Do One New Thing A Day To Solve Your Problems
Algon
3 Oct 2025 17:08 UTC
208
points
28
comments
2
min read
LW
link
Back to top
Next