Chris_Leong comments on Basin broadness depends on the size and number of orthogonal features

Chris_Leong 28 Aug 2022 13:43 UTC
2 points
0
Talking about broad basins in this community reminds me of the claims of corrigible AI: that if you could get an AGI that is at least a little bit willing to cooperate with humans in adjusting its own preferences to better align with ours, then it would fall into a broad basin of corrigibility and become more aligned with human values over time
I don’t suppose you could link me to a post arguing for this.
- Jon Garcia 28 Aug 2022 14:25 UTC
  1 point
  0
  Parent
  I was thinking of paulfchristiano’s articles on corrigibility (https://www.lesswrong.com/posts/fkLYhTQteAu5SinAc/corrigibility):
  In this post I claim:
  
  A benign act-based agent will be robustly corrigible if we want it to be.
  A sufficiently corrigible agent will tend to become more corrigible and benign over time. Corrigibility marks out a broad basin of attraction towards acceptable outcomes.