Value Learning

This is a se­quence in­ves­ti­gat­ing the fea­si­bil­ity of one ap­proach to AI al­ign­ment: value learn­ing.

Pre­face to the se­quence on value learning

Ambitious Value Learning

What is am­bi­tious value learn­ing?

The easy goal in­fer­ence prob­lem is still hard

Hu­mans can be as­signed any val­ues what­so­ever…

La­tent Vari­ables and Model Mis-Specification

Model Mis-speci­fi­ca­tion and In­verse Re­in­force­ment Learning

Fu­ture di­rec­tions for am­bi­tious value learning

Goals vs Utility Functions

Am­bi­tious value learn­ing aims to give the AI the cor­rect util­ity func­tion to avoid catas­tro­phe. Given its difficulty, we re­visit the ar­gu­ments for util­ity func­tions in the first place.

In­tu­itions about goal-di­rected behavior

Co­her­ence ar­gu­ments do not im­ply goal-di­rected behavior

Will hu­mans build goal-di­rected agents?

AI safety with­out goal-di­rected behavior

Narrow Value Learning

What is nar­row value learn­ing?

Am­bi­tious vs. nar­row value learning

Hu­man-AI Interaction

Re­ward uncertainty

The hu­man side of interaction

Fol­low­ing hu­man norms

Fu­ture di­rec­tions for nar­row value learning

Con­clu­sion to the se­quence on value learning