Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Chatbots Aren’t Conscious, But They Do Have Consciousness
Timothy Danforth
16 Mar 2026 17:09 UTC
0
points
0
comments
1
min read
LW
link
Compradorization
Benquo
16 Mar 2026 16:10 UTC
42
points
0
comments
21
min read
LW
link
Reasons to be pessimistic (and optimistic) on the future of biosecurity
Abhishaike Mahajan
16 Mar 2026 15:45 UTC
5
points
0
comments
41
min read
LW
link
(www.owlposting.com)
We found an open weight model that games alignment honeypots
Thomas Read
and
Joseph Bloom
16 Mar 2026 12:57 UTC
34
points
0
comments
10
min read
LW
link
Models differ in identity propensities
Jan_Kulveit
,
Raymond Douglas
,
vgel
,
owencb
,
David Duvenaud
and
Ondřej Havlíček
16 Mar 2026 10:45 UTC
31
points
0
comments
14
min read
LW
link
Terrified Comments on Corrigibility in Claude’s Constitution
Zack_M_Davis
16 Mar 2026 7:36 UTC
45
points
6
comments
11
min read
LW
link
Hidden Role Games as a Trusted Model Eval
james.lucassen
16 Mar 2026 4:46 UTC
13
points
0
comments
9
min read
LW
link
(jlucassen.com)
Digital Dichotomy and Why it exists.
Yesh Chala
16 Mar 2026 2:02 UTC
6
points
0
comments
3
min read
LW
link
Hello, World of Mechanistic Interpetability
ValueShift Research
15 Mar 2026 23:36 UTC
6
points
0
comments
5
min read
LW
link
(I am confused about) Non-linear utilitarian scaling
core
15 Mar 2026 23:33 UTC
9
points
3
comments
4
min read
LW
link
Schedule meetings using the Pareto principle
beyarkay
15 Mar 2026 21:18 UTC
2
points
5
comments
2
min read
LW
link
(boydkane.com)
What Are We Actually Evaluating When We Say a Belief “Tracks Truth”?
Alex Glaucon
15 Mar 2026 19:59 UTC
2
points
2
comments
6
min read
LW
link
Superintelligence Risk Education that Scales – Lens Academy
Luc Brinkman
,
meriton
,
pleiotroth
and
Chris-Lons
15 Mar 2026 19:32 UTC
43
points
0
comments
3
min read
LW
link
Emergent stigmergic coordination in AI agents?
David Africa
15 Mar 2026 12:30 UTC
42
points
0
comments
3
min read
LW
link
Some models don’t identify with their official name
jordine
15 Mar 2026 11:02 UTC
26
points
3
comments
3
min read
LW
link
My Willing Complicity In “Human Rights Abuse”
AlphaAndOmega
15 Mar 2026 10:42 UTC
139
points
13
comments
11
min read
LW
link
Less Capable Misaligned ASIs Imply More Suffering
Ihor Kendiukhov
15 Mar 2026 9:36 UTC
7
points
3
comments
6
min read
LW
link
When do intuitions need to be reliable?
Anthony DiGiovanni
15 Mar 2026 4:18 UTC
7
points
3
comments
3
min read
LW
link
The Artificial Self
Jan_Kulveit
,
Raymond Douglas
,
vgel
,
owencb
,
David Duvenaud
and
Ondřej Havlíček
15 Mar 2026 1:37 UTC
97
points
9
comments
29
min read
LW
link
Bridge Thinking and Wall Thinking
Jay Bailey
15 Mar 2026 0:20 UTC
34
points
6
comments
1
min read
LW
link
Back to top
Next