Formal Alignment

Gordon Seidoh Worley

4 Dec 2019 20:45 UTC

Alignment is typical defined loosely as “AI aligned with human intent, values, or preferences”. This developing sequence of posts is part of an investigation into means for formally stating alignment in a precise enough way that we can use mathematics to formally verify if a proposed alignment mechanism would achieve alignment.

Formally Stating the AI Alignment Problem

Gordon Seidoh Worley19 Feb 2018 19:06 UTC

14 points

0 comments13 min readLW link

(mapandterritory.org)

Minimization of prediction error as a foundation for human values in AI alignment

Gordon Seidoh Worley9 Oct 2019 18:23 UTC

15 points

42 comments5 min readLW link

Values, Valence, and Alignment

Gordon Seidoh Worley5 Dec 2019 21:06 UTC

12 points

4 comments13 min readLW link

Towards deconfusing values

Gordon Seidoh Worley29 Jan 2020 19:28 UTC

12 points

4 comments7 min readLW link

Deconfusing Human Values Research Agenda v1

Gordon Seidoh Worley23 Mar 2020 16:25 UTC

28 points

12 comments4 min readLW link