mushroomsoup comments on Instrumental Goals Are A Different And Friendlier Kind Of Thing Than Terminal Goals

mushroomsoup 26 Jan 2025 19:30 UTC
3 points
0
This comment raises some good points, but even “there will be a natural pressure for [subprocesses] to resemble a corrigible agent” seem to be debatable. Again consider the restaurant setting. It is sometime necessary for restaurants to close temporarily for renovation to increase the seating capacity, upgrade equipment, etc. The head chef who decided to renovate will be making the instrumental goals of all the other chefs (make a good food, earn money to stay alive) untenable while they are furloughed. More generally, progress towards terminal goals is not monotonic and thus only focusing on the local topology of the optimization landscape might be insufficient to predict long-horizon trends.
- Max Harms 27 Jan 2025 17:44 UTC
  2 points
  0
  Parent
  This seems right. Some sub-properties of corrigibility, such as not subverting the higher-level and being shutdownable, should be expected in well-constructed sub-processes. But corrigibility is probably about more than just that (e.g. perhaps myopia) and we should be careful not to assume that well-constructed sub-processes that resemble agents will get all the corrigibility properties.
  - Noosphere89 27 Jan 2025 18:40 UTC
    1 point
    0
    Parent
    To be fair, I think the shutdownableness of an AI/not subverting higher level goals was the original motivation of all the corrigibility research, so this is a good thing.