Interesting, thanks for this. Hmmm. I’m not sure this distinction between internally modelling the whole problem vs. acting in feedback loops is helpful—won’t the AIs almost certainly be modelling the whole problem, once they reach a level of general competence not much higher than what they have now? They are pretty situationally aware already.
Yeah, that’s true. I expect there to be a knowing/wanting split—AI might be able to make many predictions about how a candidate action will affect many slightly-conflicting notions of “alignment”, or make other long-term predictions, but that doesn’t mean it’s using those predictions to pick actions. Many people want to build AI that picks actions based on short-term considerations related to the task assigned to it.
Interesting, thanks for this. Hmmm. I’m not sure this distinction between internally modelling the whole problem vs. acting in feedback loops is helpful—won’t the AIs almost certainly be modelling the whole problem, once they reach a level of general competence not much higher than what they have now? They are pretty situationally aware already.
Yeah, that’s true. I expect there to be a knowing/wanting split—AI might be able to make many predictions about how a candidate action will affect many slightly-conflicting notions of “alignment”, or make other long-term predictions, but that doesn’t mean it’s using those predictions to pick actions. Many people want to build AI that picks actions based on short-term considerations related to the task assigned to it.