Max Harms comments on Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

Max Harms 27 Jan 2025 17:44 UTC
2 points
0
This seems right. Some sub-properties of corrigibility, such as not subverting the higher-level and being shutdownable, should be expected in well-constructed sub-processes. But corrigibility is probably about more than just that (e.g. perhaps myopia) and we should be careful not to assume that well-constructed sub-processes that resemble agents will get all the corrigibility properties.
- Noosphere89 27 Jan 2025 18:40 UTC
  1 point
  0
  Parent
  To be fair, I think the shutdownableness of an AI/not subverting higher level goals was the original motivation of all the corrigibility research, so this is a good thing.