Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
New
Hot
Active
Old
Page
1
Luck based medicine: inositol
Elizabeth
22 Sep 2023 20:10 UTC
14
points
0
comments
3
min read
LW
link
(acesounderglass.com)
If influence functions are not approximating leave-one-out, how are they supposed to help?
Fabien Roger
22 Sep 2023 14:23 UTC
40
points
2
comments
3
min read
LW
link
Modeling p(doom) with TrojanGDP
K. Liam Smith
22 Sep 2023 14:19 UTC
−2
points
2
comments
13
min read
LW
link
Let’s talk about Impostor syndrome in AI safety
Igor Ivanov
22 Sep 2023 13:51 UTC
31
points
1
comment
3
min read
LW
link
Fund Transit With Development
jefftk
22 Sep 2023 11:10 UTC
31
points
3
comments
3
min read
LW
link
(www.jefftk.com)
Atoms to Agents Proto-Lectures
johnswentworth
22 Sep 2023 6:22 UTC
56
points
4
comments
2
min read
LW
link
(www.youtube.com)
Would You Work Harder In The Least Convenient Possible World?
Firinn
22 Sep 2023 5:17 UTC
39
points
20
comments
9
min read
LW
link
Contra Kevin Dorst’s Rational Polarization
azsantosk
22 Sep 2023 4:28 UTC
6
points
1
comment
9
min read
LW
link
What social science research do you want to see reanalyzed?
Michael Wiebe
22 Sep 2023 0:03 UTC
10
points
7
comments
1
min read
LW
link
Immortality or death by AGI
ImmortalityOrDeathByAGI
21 Sep 2023 23:59 UTC
42
points
21
comments
4
min read
LW
link
Neel Nanda on the Mechanistic Interpretability Researcher Mindset
Michaël Trazzi
21 Sep 2023 19:47 UTC
34
points
1
comment
3
min read
LW
link
(theinsideview.ai)
Require AGI to be Explainable
PeterMcCluskey
21 Sep 2023 16:11 UTC
5
points
0
comments
6
min read
LW
link
(bayesianinvestor.com)
Update to “Dominant Assurance Contract Platform”
moyamo
21 Sep 2023 16:09 UTC
26
points
1
comment
1
min read
LW
link
Sparse Autoencoders: Future Work
Logan Riggs
and
Aidan Ewart
21 Sep 2023 15:30 UTC
13
points
0
comments
6
min read
LW
link
Sparse Autoencoders Find Highly Interpretable Directions in Language Models
Logan Riggs
,
Hoagy
,
Aidan Ewart
and
Robert_AIZI
21 Sep 2023 15:30 UTC
97
points
5
comments
5
min read
LW
link
There should be more AI safety orgs
Marius Hobbhahn
21 Sep 2023 14:53 UTC
116
points
6
comments
17
min read
LW
link
[Question]
How are rationalists or orgs blocked, that you can see?
Nathan Young
21 Sep 2023 2:37 UTC
7
points
2
comments
1
min read
LW
link
Notes on ChatGPT’s “memory” for strings and for events
Bill Benzon
20 Sep 2023 18:12 UTC
3
points
0
comments
10
min read
LW
link
Belief and the Truth
Sam I am
20 Sep 2023 17:38 UTC
2
points
12
comments
5
min read
LW
link
(open.substack.com)
Image Hijacks: Adversarial Images can Control Generative Models at Runtime
Scott Emmons
,
Luke Bailey
and
Euan Ong
20 Sep 2023 15:23 UTC
55
points
8
comments
1
min read
LW
link
(arxiv.org)
Back to top
Next