The Ethicophysics

30 Nov 2023 3:08 UTC

In this sequence, we attempt to solve the alignment problem, rather than discussing it ad infinitum. Since the alignment problem is incredibly difficult to solve, this sequence is probably going to end up being pretty long, and many of the posts will be more complex and harder to read than they really have to be. We apologize to the reader for this situation, and promise to improve the individual posts and the overall flow of the sequence as quickly as our limited time permits.

Moral Reality Check (a short story)

jessicata26 Nov 2023 5:03 UTC

152 points

45 comments21 min readLW link 1 review

(unstableontology.com)

Agent Boundaries Aren’t Markov Blankets. [Unless they’re non-causal; see comments.]

abramdemski20 Nov 2023 18:23 UTC

82 points

11 comments2 min readLW link

My Alignment Research Agenda (“the Ethicophysics”)

MadHatter30 Nov 2023 2:57 UTC

−13 points

0 comments1 min readLW link

Some Intuitions for the Ethicophysics

MadHatter and mishka

30 Nov 2023 6:47 UTC

2 points

4 comments8 min readLW link

The Alignment Agenda THEY Don’t Want You to Know About

MadHatter30 Nov 2023 4:29 UTC

−19 points

16 comments1 min readLW link

My Mental Model of Infohazards

MadHatter23 Nov 2023 2:37 UTC

8 points

34 comments2 min readLW link 1 review

[Question] Stupid Question: Why am I getting consistently downvoted?

MadHatter30 Nov 2023 0:21 UTC

31 points

138 comments1 min readLW link

Trying to Make a Treacherous Mesa-Optimizer

MadHatter9 Nov 2022 18:07 UTC

95 points

14 comments4 min readLW link

(attentionspan.blog)

Homework Answer: Glicko Ratings for War

MadHatter30 Nov 2023 4:08 UTC

−45 points

1 comment77 min readLW link

(gist.github.com)

Enkrateia: a safe model-based reinforcement learning algorithm

MadHatter30 Nov 2023 15:51 UTC

−15 points

4 comments2 min readLW link

(github.com)

A Formula for Violence (and Its Antidote)

MadHatter30 Nov 2023 16:04 UTC

−22 points

6 comments1 min readLW link

(blog.simpleheart.org)