Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
The Future of Aligning Deep Learning systems will probably look like “training on interp”
williawa
20 Mar 2026 23:06 UTC
8
points
1
comment
4
min read
LW
link
An agent autonomously builds a 1.5 GHz Linux-capable RISC-V CPU
sanxiyn
20 Mar 2026 23:03 UTC
16
points
0
comments
2
min read
LW
link
(arxiv.org)
Untrusted monitoring: extra bits
Morgan S
20 Mar 2026 21:32 UTC
7
points
0
comments
15
min read
LW
link
Finding features in Transformers: Contrastive directions elicit stronger low-level perturbation responses than baselines
Francisco Ferreira da Silva
and
StefanHex
20 Mar 2026 21:09 UTC
18
points
1
comment
6
min read
LW
link
ARENA 7.0 Impact Report
JScriven
and
JamesH
20 Mar 2026 17:09 UTC
7
points
0
comments
21
min read
LW
link
The Federal AI Policy Framework: An Improvement, But My Offer Is (Still Almost) Nothing
Zvi
20 Mar 2026 16:51 UTC
20
points
0
comments
8
min read
LW
link
(thezvi.wordpress.com)
Confusion around the term reward hacking
ariana_azarbal
20 Mar 2026 16:13 UTC
29
points
5
comments
5
min read
LW
link
The Distaff Texts
Tomás B.
20 Mar 2026 15:05 UTC
44
points
1
comment
14
min read
LW
link
It’s a Good Thing to Respond to Internet Trolls
Bowl of Cereal
20 Mar 2026 14:22 UTC
1
point
2
comments
2
min read
LW
link
Untrusted Monitoring is Default; Trusted Monitoring is not
J Bostock
20 Mar 2026 14:10 UTC
19
points
0
comments
4
min read
LW
link
Against Messianic AI: Why Optimizing the Environment Doesn’t Optimize the Agent
Nathan Heath
20 Mar 2026 12:40 UTC
−3
points
0
comments
3
min read
LW
link
2nd (Unofficial) ACX Weekend
Fernand0
20 Mar 2026 12:13 UTC
1
point
0
comments
1
min read
LW
link
Why I am not buying IPv4 addresses as an investment
samuelshadrach
20 Mar 2026 9:02 UTC
5
points
1
comment
5
min read
LW
link
(samuelshadrach.com)
Hundred ways a superintelligence could kill you (non-serious exercise)
samuelshadrach
20 Mar 2026 8:58 UTC
2
points
1
comment
6
min read
LW
link
(samuelshadrach.com)
Internet anonymity without Tor
samuelshadrach
20 Mar 2026 8:52 UTC
3
points
0
comments
3
min read
LW
link
(samuelshadrach.com)
No, You Don’t Need Self-Locating Evidence.
Ape in the coat
20 Mar 2026 5:38 UTC
9
points
2
comments
5
min read
LW
link
(substack.com)
The Low Hanging Fruit of AI Self Improvement
HunterJay
20 Mar 2026 4:09 UTC
1
point
0
comments
5
min read
LW
link
Nullius in Verba
Aurelia
20 Mar 2026 3:19 UTC
90
points
10
comments
12
min read
LW
link
Does Hebrew Have Verbs?
Benquo
20 Mar 2026 3:04 UTC
27
points
4
comments
6
min read
LW
link
Positive-sum interactions between players with linear utility in resources
Cleo Nardo
20 Mar 2026 0:42 UTC
10
points
0
comments
2
min read
LW
link
Back to top
Next