Thane Ruthenis

Karma: 3,471

Agency As a Natural Abstraction

Thane Ruthenis13 May 2022 18:02 UTC

55 points

9 comments13 min readLW link

Reshaping the AI Industry

Thane Ruthenis29 May 2022 22:54 UTC

147 points

35 comments21 min readLW link

Poorly-Aimed Death Rays

Thane Ruthenis11 Jun 2022 18:29 UTC

48 points

5 comments4 min readLW link

Towards Gears-Level Understanding of Agency

Thane Ruthenis16 Jun 2022 22:00 UTC

23 points

4 comments18 min readLW link

The Unified Theory of Normative Ethics

Thane Ruthenis17 Jun 2022 19:55 UTC

8 points

0 comments6 min readLW link

Is This Thing Sentient, Y/N?

Thane Ruthenis20 Jun 2022 18:37 UTC

4 points

9 comments7 min readLW link

Reframing the AI Risk

Thane Ruthenis1 Jul 2022 18:44 UTC

26 points

7 comments6 min readLW link

Goal Alignment Is Robust To the Sharp Left Turn

Thane Ruthenis13 Jul 2022 20:23 UTC

47 points

16 comments4 min readLW link

What Environment Properties Select Agents For World-Modeling?

Thane Ruthenis23 Jul 2022 19:27 UTC

24 points

1 comment12 min readLW link

Convergence Towards World-Models: A Gears-Level Model

Thane Ruthenis4 Aug 2022 23:31 UTC

38 points

1 comment13 min readLW link

Interpretability Tools Are an Attack Channel

Thane Ruthenis17 Aug 2022 18:47 UTC

42 points

14 comments1 min readLW link

Broad Picture of Human Values

Thane Ruthenis20 Aug 2022 19:42 UTC

42 points

6 comments10 min readLW link

AI Risk in Terms of Unstable Nuclear Software

Thane Ruthenis26 Aug 2022 18:49 UTC

30 points

1 comment6 min readLW link

Are Generative World Models a Mesa-Optimization Risk?

Thane Ruthenis29 Aug 2022 18:37 UTC

13 points

2 comments3 min readLW link

Greed Is the Root of This Evil

Thane Ruthenis13 Oct 2022 20:40 UTC

18 points

7 comments8 min readLW link

Value Formation: An Overarching Model

Thane Ruthenis15 Nov 2022 17:16 UTC

34 points

20 comments34 min readLW link

Corrigibility Via Thought-Process Deference

Thane Ruthenis24 Nov 2022 17:06 UTC

17 points

5 comments9 min readLW link

Accurate Models of AI Risk Are Hyperexistential Exfohazards

Thane Ruthenis25 Dec 2022 16:50 UTC

30 points

38 comments9 min readLW link

In Defense of Wrapper-Minds

Thane Ruthenis28 Dec 2022 18:28 UTC

23 points

38 comments3 min readLW link

Internal Interfaces Are a High-Priority Interpretability Target

Thane Ruthenis29 Dec 2022 17:49 UTC

26 points

6 comments7 min readLW link