Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
The LessWrong 2022 Review
habryka
5 Dec 2023 4:00 UTC
59
points
2
comments
4
min read
LW
link
Accelerating science through evolvable institutions
jasoncrawford
4 Dec 2023 23:21 UTC
12
points
2
comments
6
min read
LW
link
(rootsofprogress.org)
Speaking to Congressional staffers about AI risk
Akash
and
hath
4 Dec 2023 23:08 UTC
137
points
3
comments
16
min read
LW
link
Interview with Vanessa Kosoy on the Value of Theoretical Research for AI
WillPetillo
4 Dec 2023 22:58 UTC
23
points
0
comments
35
min read
LW
link
2023 Alignment Research Updates from FAR AI
AdamGleave
and
EuanMcLean
4 Dec 2023 22:32 UTC
9
points
0
comments
8
min read
LW
link
(far.ai)
n of m ring signatures
DanielFilan
4 Dec 2023 20:00 UTC
42
points
7
comments
1
min read
LW
link
(danielfilan.com)
[Question]
Why using activation for interpreting GPT-2?
sprout_ust
4 Dec 2023 18:49 UTC
1
point
0
comments
1
min read
LW
link
Mechanistic interpretability through clustering
Alistair Fraser
4 Dec 2023 18:49 UTC
1
point
0
comments
1
min read
LW
link
Agents which are EU-maximizing as a group are not EU-maximizing individually
Mlxa
4 Dec 2023 18:49 UTC
3
points
2
comments
2
min read
LW
link
Planning in LLMs: Insights from AlphaGo
jco
4 Dec 2023 18:48 UTC
3
points
1
comment
11
min read
LW
link
Non-classic stories about scheming (Section 2.3.2 of “Scheming AIs”)
Joe Carlsmith
4 Dec 2023 18:44 UTC
8
points
0
comments
20
min read
LW
link
5. The Mutable Values Problem in Value Learning and CEV
RogerDearnaley
4 Dec 2023 18:31 UTC
4
points
0
comments
47
min read
LW
link
[Valence series] 1. Introduction
Steven Byrnes
4 Dec 2023 15:40 UTC
59
points
4
comments
15
min read
LW
link
Hashmarks: Privacy-Preserving Benchmarks for High-Stakes AI Evaluation
Paul Bricman
4 Dec 2023 7:31 UTC
11
points
5
comments
16
min read
LW
link
(arxiv.org)
A call for a quantitative report card for AI bioterrorism threat models
Juno
4 Dec 2023 6:35 UTC
11
points
0
comments
10
min read
LW
link
FTL travel summary
Isaac King
4 Dec 2023 5:17 UTC
0
points
3
comments
3
min read
LW
link
the micro-fulfillment cambrian explosion
bhauth
4 Dec 2023 1:15 UTC
49
points
4
comments
4
min read
LW
link
(www.bhauth.com)
Nietzsche’s Morality in Plain English
Arjun Panickssery
4 Dec 2023 0:57 UTC
62
points
8
comments
4
min read
LW
link
(arjunpanickssery.substack.com)
Meditations on Mot
Richard_Ngo
4 Dec 2023 0:19 UTC
43
points
5
comments
8
min read
LW
link
(www.mindthefuture.info)
The Witness
Richard_Ngo
3 Dec 2023 22:27 UTC
70
points
2
comments
14
min read
LW
link
(www.narrativeark.xyz)
Back to top
Next