Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
METR: Measuring AI Ability to Complete Long Tasks
Zach Stein-Perlman
19 Mar 2025 16:00 UTC
173
points
44
comments
1
min read
LW
link
(metr.org)
Why abortion looks more okay to us than killing babies
cousin_it
24 Nov 2010 10:08 UTC
25
points
67
comments
1
min read
LW
link
[Question]
How far along Metr’s law can AI start automating or helping with alignment research?
Christopher King
20 Mar 2025 15:58 UTC
18
points
17
comments
1
min read
LW
link
Blues, Greens and abortion
Snowyowl
5 Mar 2011 19:15 UTC
17
points
158
comments
1
min read
LW
link
Why White-Box Redteaming Makes Me Feel Weird
Zygi Straznickas
16 Mar 2025 18:54 UTC
171
points
28
comments
3
min read
LW
link
[Question]
Any mistakes in my understanding of Transformers?
Kallistos
21 Mar 2025 0:34 UTC
1
point
0
comments
1
min read
LW
link
The principle of genomic liberty
TsviBT
19 Mar 2025 14:27 UTC
87
points
16
comments
17
min read
LW
link
A Critique of “Utility”
Zero Contradictions
20 Mar 2025 23:21 UTC
−6
points
1
comment
2
min read
LW
link
(thewaywardaxolotl.blogspot.com)
Interpretability as Compression: Reconsidering SAE Explanations of Neural Activations with MDL-SAEs
Kola Ayonrinde
,
Michael Pearce
and
Lee Sharkey
23 Aug 2024 18:52 UTC
42
points
8
comments
16
min read
LW
link
[Question]
Why am I getting downvoted on Lesswrong?
Oxidize
19 Mar 2025 18:32 UTC
4
points
13
comments
1
min read
LW
link
Counter-theses on Sleep
Natália
21 Mar 2022 23:21 UTC
447
points
135
comments
15
min read
LW
link
1
review
Algebraic Linguistics
abstractapplic
7 Dec 2024 19:18 UTC
35
points
29
comments
5
min read
LW
link
How AI Takeover Might Happen in 2 Years
joshc
7 Feb 2025 17:10 UTC
391
points
131
comments
29
min read
LW
link
(x.com)
Intention to Treat
Alicorn
20 Mar 2025 20:01 UTC
70
points
3
comments
2
min read
LW
link
FrontierMath Score of o3-mini Much Lower Than Claimed
YafahEdelman
17 Mar 2025 22:41 UTC
48
points
7
comments
1
min read
LW
link
Why Are The Human Sciences Hard? Two New Hypotheses
Aydin Mohseni
,
Daniel Herrmann
and
ben_levinstein
18 Mar 2025 15:45 UTC
20
points
7
comments
9
min read
LW
link
A Path out of Insufficient Views
Unreal
24 Sep 2024 20:00 UTC
40
points
53
comments
9
min read
LW
link
Anthropic: Progress from our Frontier Red Team
UnofficialLinkpostBot
20 Mar 2025 19:12 UTC
2
points
0
comments
6
min read
LW
link
(www.anthropic.com)
The Geometry of Linear Regression versus PCA
criticalpoints
23 Feb 2025 21:01 UTC
20
points
5
comments
6
min read
LW
link
(eregis.github.io)
Emergent Misalignment: Narrow finetuning can produce broadly misaligned LLMs
Jan Betley
and
Owain_Evans
25 Feb 2025 17:39 UTC
321
points
88
comments
4
min read
LW
link
Back to top
Next