Formal Alignment

Alignment is typical defined loosely as “AI aligned with human intent, values, or preferences”. This developing sequence of posts is part of an investigation into means for formally stating alignment in a precise enough way that we can use mathematics to formally verify if a proposed alignment mechanism would achieve alignment.

For­mally Stat­ing the AI Align­ment Problem

Min­i­miza­tion of pre­dic­tion er­ror as a foun­da­tion for hu­man val­ues in AI alignment

Values, Valence, and Alignment

Towards de­con­fus­ing values

De­con­fus­ing Hu­man Values Re­search Agenda v1