The Ethicophysics

In this sequence, we attempt to solve the alignment problem, rather than discussing it ad infinitum. Since the alignment problem is incredibly difficult to solve, this sequence is probably going to end up being pretty long, and many of the posts will be more complex and harder to read than they really have to be. We apologize to the reader for this situation, and promise to improve the individual posts and the overall flow of the sequence as quickly as our limited time permits.

Mo­ral Real­ity Check (a short story)

Agent Boundaries Aren’t Markov Blan­kets. [Un­less they’re non-causal; see com­ments.]

My Align­ment Re­search Agenda (“the Ethico­physics”)

Some In­tu­itions for the Ethicophysics

The Align­ment Agenda THEY Don’t Want You to Know About

My Men­tal Model of Infohazards

[Question] Stupid Ques­tion: Why am I get­ting con­sis­tently down­voted?

Try­ing to Make a Treach­er­ous Mesa-Optimizer

Home­work An­swer: Glicko Rat­ings for War

Enkrateia: a safe model-based re­in­force­ment learn­ing algorithm

A For­mula for Violence (and Its An­ti­dote)