Value Learning

This is a sequence investigating the feasibility of one approach to AI alignment: value learning.

Pre­face to the se­quence on value learning

1. Ambitious Value Learning

What is am­bi­tious value learn­ing?

The easy goal in­fer­ence prob­lem is still hard

Hu­mans can be as­signed any val­ues what­so­ever…

La­tent Vari­ables and Model Mis-Specification

Model Mis-speci­fi­ca­tion and In­verse Re­in­force­ment Learning

Fu­ture di­rec­tions for am­bi­tious value learning

2. Goals vs Utility Functions

Ambitious value learning aims to give the AI the correct utility function to avoid catastrophe. Given its difficulty, we revisit the arguments for utility functions in the first place.

In­tu­itions about goal-di­rected behavior

Co­her­ence ar­gu­ments do not im­ply goal-di­rected behavior

Will hu­mans build goal-di­rected agents?

AI safety with­out goal-di­rected behavior

3. Narrow Value Learning

What is nar­row value learn­ing?

Am­bi­tious vs. nar­row value learning

Hu­man-AI Interaction

Re­ward uncertainty

The hu­man side of interaction

Fol­low­ing hu­man norms

Fu­ture di­rec­tions for nar­row value learning

Con­clu­sion to the se­quence on value learning