Sorry, I don’t understand. If “reward is the optimization target” incorrectly implies that AIs would change their behavior more than they do, then the drug addict example seems orthogonal to that issue?
Ah okay, that makes more sense to me. I assumed that you would be talking about AIs similar to current-day systems since you said that you’d updated from the behavior of current-day systems.
I am talking about AIs similar to current-day systems, for some notion of “similar” at least. But I’m imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.
Sorry, I don’t understand. If “reward is the optimization target” incorrectly implies that AIs would change their behavior more than they do, then the drug addict example seems orthogonal to that issue?
I didn’t say reward is the optimization target NOW! I said it might be in the future! See the other chain/thread with Violet Hour.
Ah okay, that makes more sense to me. I assumed that you would be talking about AIs similar to current-day systems since you said that you’d updated from the behavior of current-day systems.
I am talking about AIs similar to current-day systems, for some notion of “similar” at least. But I’m imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.