David Johnston comments on Model Integrity: MAI on Value Alignment

David Johnston 5 Dec 2024 22:59 UTC
3 points
0
Is your view closer to:
- there’s two hard steps (instruction following, value alignment), and of the two instruction following is much more pressing
- instruction following is the only hard step; if you get that, value alignment is almost certain to follow
- Seth Herd 6 Dec 2024 2:17 UTC
  2 points
  0
  Parent
  The first. Value alignment is much harder. But it will be vastly easier with smarter-than-human help. So there are two difficult steps, and it’s clear which one should be tackled first.
  
  The difficulty with value alignment is both in figuring out what we actually want, and then figuring out how to make those values stable in mind that changes as it learns new things.