Daniel Kokotajlo comments on Daniel Kokotajlo’s Shortform

Daniel Kokotajlo 22 Jan 2025 17:35 UTC
2 points
0
Interesting, thanks for this. Hmmm. I’m not sure this distinction between internally modelling the whole problem vs. acting in feedback loops is helpful—won’t the AIs almost certainly be modelling the whole problem, once they reach a level of general competence not much higher than what they have now? They are pretty situationally aware already.
- Charlie Steiner 22 Jan 2025 23:00 UTC
  2 points
  0
  Parent
  Yeah, that’s true. I expect there to be a knowing/wanting split—AI might be able to make many predictions about how a candidate action will affect many slightly-conflicting notions of “alignment”, or make other long-term predictions, but that doesn’t mean it’s using those predictions to pick actions. Many people want to build AI that picks actions based on short-term considerations related to the task assigned to it.