Gunnar_Zarncke comments on How human-like do safe AI motivations need to be?

Gunnar_Zarncke 12 Nov 2025 19:37 UTC
3 points
−1
In particular: the motivations that matter most for safe instruction-following are not the AI’s long-term consequentialist motivations (indeed, if possible, I think we mostly want to avoid our AIs having this kind of motivation except insofar as it is implied by safe instruction-following).
That seems like a reasonable position given that you accept the risk of long-term motivations. But it doesn’t seem to be what people are actually aiming for. In particular, people seem to aim for agentic AI that can act on a person’s behalf on longer time scales. And the trend predictions by METR seem to point to longer horizons soon.
- StanislavKrym 12 Nov 2025 23:32 UTC
  1 point
  0
  Parent
  I have already remarked in another comment that a short time horizon failed to prevent GPT4o(!) from making the user post messages into the wilderness. Does it mean that some kind of long-term motives has already appeared?