Charlie Steiner comments on Daniel Kokotajlo’s Shortform

Charlie Steiner 22 Jan 2025 23:00 UTC
2 points
0
Yeah, that’s true. I expect there to be a knowing/wanting split—AI might be able to make many predictions about how a candidate action will affect many slightly-conflicting notions of “alignment”, or make other long-term predictions, but that doesn’t mean it’s using those predictions to pick actions. Many people want to build AI that picks actions based on short-term considerations related to the task assigned to it.