I was thinking of paulfchristiano’s articles on corrigibility (https://www.lesswrong.com/posts/fkLYhTQteAu5SinAc/corrigibility):
In this post I claim: A benign act-based agent will be robustly corrigible if we want it to be.A sufficiently corrigible agent will tend to become more corrigible and benign over time. Corrigibility marks out a broad basin of attraction towards acceptable outcomes.
In this post I claim:
A benign act-based agent will be robustly corrigible if we want it to be.
A sufficiently corrigible agent will tend to become more corrigible and benign over time. Corrigibility marks out a broad basin of attraction towards acceptable outcomes.
I was thinking of paulfchristiano’s articles on corrigibility (https://www.lesswrong.com/posts/fkLYhTQteAu5SinAc/corrigibility):