Writer comments on Utility Engineering: Analyzing and Controlling Emergent Value Systems in AIs

Writer 12 Feb 2025 16:09 UTC
6 points
0
I’d guess an important caveat might be that stated preferences being coherent doesn’t immediately imply that behavior in other situations will be consistent with those preferences. Still, this should be an update towards agentic AI systems in the near future being goal-directed in the spooky consequentialist sense.