Ah okay, that makes more sense to me. I assumed that you would be talking about AIs similar to current-day systems since you said that you’d updated from the behavior of current-day systems.
I am talking about AIs similar to current-day systems, for some notion of “similar” at least. But I’m imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.
I didn’t say reward is the optimization target NOW! I said it might be in the future! See the other chain/thread with Violet Hour.
Ah okay, that makes more sense to me. I assumed that you would be talking about AIs similar to current-day systems since you said that you’d updated from the behavior of current-day systems.
I am talking about AIs similar to current-day systems, for some notion of “similar” at least. But I’m imagining AIs that are trained on lots more RL, especially lots more long-horizon RL.