RogerDearnaley comments on The alignment stability problem

RogerDearnaley 5 Dec 2023 10:26 UTC
LW: 3 AF: 2
0
AF
Approaches to alignment stability
I view this as pretty-much a solved problem, solved by value learning. Though there are then issues due to the mutability of human values.
- Seth Herd 19 Sep 2024 16:06 UTC
  2 points
  0
  Parent
  As per our discussions on our other posts, I don’t think we can say that value learning in itself solves the problem. The issue of whether the ASI’s interpretation of its central goal or instructions changing is not automatically solved by adopting that approach. The value mutability problem you link to is a separate issue. I’m not addressing here whether human values might change, but whether an AGI’s interpretations of its central goal/values might change.