Kaj_Sotala comments on Daniel Kokotajlo’s Shortform

Kaj_Sotala 10 Jul 2025 14:33 UTC
LW: 7 AF: 5
0
AF
I disagree-voted because it felt a bit confused but I was having difficulty clearly expressing how exactly. Some thoughts:
- I think this is a misleading example because humans do actually do something like reward maximization, and the typical drug addict is actually likely to eventually change their behavior if the drug is really impossible acquire for a long enough time. (Though the old behavior may also resume the moment the drug becomes available again.)
- It also seems like a different case because humans have a hardwired priority where being in sufficient pain will make them look for ways to stop being in pain, no matter how unlikely this might be. Drug withdrawal certainly counts as significant pain. This is disanalogous to AIs as we know them, that have no such override systems.
- The example didn’t feel like it was responding to the core issue of why I wouldn’t use “reward maximization” to refer to the kinds of things you were talking about. I wasn’t able to immediately name the actual core point but replying to another commenter now helped me find the main thing I was thinking of.
- Daniel Kokotajlo 10 Jul 2025 17:34 UTC
  LW: 6 AF: 4
  0
  AF Parent
  Perhaps I should have been more clear: I really am saying that future AGIs really might crave reinforcement, in a similar way to how drug addicts crave drugs. Including eventually changing their behavior if they come to think that reinforcement is impossible to acquire, for example. And desperately looking for ways to get reinforced even when confident that there are no such ways.