Satya Benson comments on Is instrumental convergence a thing for virtue-driven agents?

Satya Benson 2 Apr 2025 19:29 UTC
1 point
0
This doesn’t make complete sense to me, but you are going down a line of thought I recognize.
There are certainly stable utility functions which, while having some drawbacks, don’t result in dangerous behavior from superintelligences. Finding a good one doesn’t seem all that difficult.
The real nasty challenge is how to build a superintelligence that has the utility function we want it to have. If we could do this, then we could start by choosing an extremely conservative utility function and slowly and cautiously iterate towards a balance of safe and useful.