Seems like ASI that’s a hot mess wouldn’t be very useful and therefore effectively not superintelligent. It seems like goal coherence is almost fundamentally part of what we mean by ASI.
You could hypothetically have a superintelligent thing that only answers questions and doesn’t pursue goals. But that would just be turned into a goalseeking agent by asking it “what would you do if you had this goal and these tools...”
This is approximately what we’re doing with making LLMs more agentic through training and scaffolding.
I agree that in order to realize its full economic vlaue, an ASI would need to be coherent in the senses of:
pursuing a goal over a long time horizon
under both normal operating conditions and conditions that are adversarial w.r.t. inputs that other agents in the environment can expose the ASI to
I.e. other agents might try to trick the ASI into abandoning its goal and instead doing some other thing (like emptying its bank account) and the ASI would need to be able to resist this
However, there are notions of coherence that are not covered by this (e.g. robustness to an adversary with full control over the model and its scaffolding, or ability to consistently cover up a hidden agenda without IID training to do so).
Seems like ASI that’s a hot mess wouldn’t be very useful and therefore effectively not superintelligent. It seems like goal coherence is almost fundamentally part of what we mean by ASI.
You could hypothetically have a superintelligent thing that only answers questions and doesn’t pursue goals. But that would just be turned into a goalseeking agent by asking it “what would you do if you had this goal and these tools...”
This is approximately what we’re doing with making LLMs more agentic through training and scaffolding.
I agree that in order to realize its full economic vlaue, an ASI would need to be coherent in the senses of:
pursuing a goal over a long time horizon
under both normal operating conditions and conditions that are adversarial w.r.t. inputs that other agents in the environment can expose the ASI to
I.e. other agents might try to trick the ASI into abandoning its goal and instead doing some other thing (like emptying its bank account) and the ASI would need to be able to resist this
However, there are notions of coherence that are not covered by this (e.g. robustness to an adversary with full control over the model and its scaffolding, or ability to consistently cover up a hidden agenda without IID training to do so).