Archive
Sequences
About
Search
Log In
Questions
Events
Shortform
Alignment Forum
AF Comments
Home
Featured
All
Tags
Recent
Comments
RSS
Thane Ruthenis
Karma:
3,471
All
Posts
Comments
New
Top
Old
Page
1
Agency As a Natural Abstraction
Thane Ruthenis
13 May 2022 18:02 UTC
55
points
9
comments
13
min read
LW
link
Reshaping the AI Industry
Thane Ruthenis
29 May 2022 22:54 UTC
147
points
35
comments
21
min read
LW
link
Poorly-Aimed Death Rays
Thane Ruthenis
11 Jun 2022 18:29 UTC
48
points
5
comments
4
min read
LW
link
Towards Gears-Level Understanding of Agency
Thane Ruthenis
16 Jun 2022 22:00 UTC
23
points
4
comments
18
min read
LW
link
The Unified Theory of Normative Ethics
Thane Ruthenis
17 Jun 2022 19:55 UTC
8
points
0
comments
6
min read
LW
link
Is This Thing Sentient, Y/N?
Thane Ruthenis
20 Jun 2022 18:37 UTC
4
points
9
comments
7
min read
LW
link
Reframing the AI Risk
Thane Ruthenis
1 Jul 2022 18:44 UTC
26
points
7
comments
6
min read
LW
link
Goal Alignment Is Robust To the Sharp Left Turn
Thane Ruthenis
13 Jul 2022 20:23 UTC
47
points
16
comments
4
min read
LW
link
What Environment Properties Select Agents For World-Modeling?
Thane Ruthenis
23 Jul 2022 19:27 UTC
24
points
1
comment
12
min read
LW
link
Convergence Towards World-Models: A Gears-Level Model
Thane Ruthenis
4 Aug 2022 23:31 UTC
38
points
1
comment
13
min read
LW
link
Interpretability Tools Are an Attack Channel
Thane Ruthenis
17 Aug 2022 18:47 UTC
42
points
14
comments
1
min read
LW
link
Broad Picture of Human Values
Thane Ruthenis
20 Aug 2022 19:42 UTC
42
points
6
comments
10
min read
LW
link
AI Risk in Terms of Unstable Nuclear Software
Thane Ruthenis
26 Aug 2022 18:49 UTC
30
points
1
comment
6
min read
LW
link
Are Generative World Models a Mesa-Optimization Risk?
Thane Ruthenis
29 Aug 2022 18:37 UTC
13
points
2
comments
3
min read
LW
link
Greed Is the Root of This Evil
Thane Ruthenis
13 Oct 2022 20:40 UTC
18
points
7
comments
8
min read
LW
link
Value Formation: An Overarching Model
Thane Ruthenis
15 Nov 2022 17:16 UTC
34
points
20
comments
34
min read
LW
link
Corrigibility Via Thought-Process Deference
Thane Ruthenis
24 Nov 2022 17:06 UTC
17
points
5
comments
9
min read
LW
link
Accurate Models of AI Risk Are Hyperexistential Exfohazards
Thane Ruthenis
25 Dec 2022 16:50 UTC
30
points
38
comments
9
min read
LW
link
In Defense of Wrapper-Minds
Thane Ruthenis
28 Dec 2022 18:28 UTC
23
points
38
comments
3
min read
LW
link
Internal Interfaces Are a High-Priority Interpretability Target
Thane Ruthenis
29 Dec 2022 17:49 UTC
26
points
6
comments
7
min read
LW
link
Back to top
Next