Is it possible that making an expected utility maximizer might be less dangerous than making something which isn’t? Consider as an alternative an expected log utility maximizer (an agent using the Kelly Criterion, or some approximation of it).
The sooner an AI wins, the more galaxies it can consume. The expected utility maximizer weighs those galaxies against the risk of failure, and is willing to take plans with much higher probabilities of failure. Like SBF, it would take bets which have a 50% chance of more-than-doubling its utility and 50% of losing it all. In many environments, this strategy will almost certainly result in failure, as the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.
The log utility maximizer carefully plans and succeeds in most or all futures. That looks like humanity dying with near-certainty.
A hyper-expected utility maximizer (an AI which maximizes expected exp(utility) or similar) would be even safer. Instead of trying to deceive you into letting it out of the box, it asks nicely or does something crazy because if it works, it can work in less time than deception, which means more galaxies.
So if we were to choose between existing in the world of a superintelligent expected log(resources) maximizer, and a superintelligent expected utility maximizer, we should maybe go for the one which results in us being alive in more futures.
Of course, the expected-log-utility agent would also appear the most capable and useful. The hyper-expected utility maximizer would be near-useless.
the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.
The side effects of the agent failing might still kill us.
For example, the failure could be something like “build a huge device which with probability 20% enables faster-than-light travel (which would allow colonizing more galaxies), and with probability 80% causes false vacuum collapse or otherwise destroys the entire universe”.
Or something on smaller scale, where the failure means blowing up the Earth, destroying all life, etc.
Is it possible that making an expected utility maximizer might be less dangerous than making something which isn’t?
Consider as an alternative an expected log utility maximizer (an agent using the Kelly Criterion, or some approximation of it).
The sooner an AI wins, the more galaxies it can consume. The expected utility maximizer weighs those galaxies against the risk of failure, and is willing to take plans with much higher probabilities of failure. Like SBF, it would take bets which have a 50% chance of more-than-doubling its utility and 50% of losing it all. In many environments, this strategy will almost certainly result in failure, as the agent goes double-or-nothing until losing everything. That means that the effects of the AI are mitigated.
The log utility maximizer carefully plans and succeeds in most or all futures. That looks like humanity dying with near-certainty.
A hyper-expected utility maximizer (an AI which maximizes expected exp(utility) or similar) would be even safer. Instead of trying to deceive you into letting it out of the box, it asks nicely or does something crazy because if it works, it can work in less time than deception, which means more galaxies.
So if we were to choose between existing in the world of a superintelligent expected log(resources) maximizer, and a superintelligent expected utility maximizer, we should maybe go for the one which results in us being alive in more futures.
Of course, the expected-log-utility agent would also appear the most capable and useful. The hyper-expected utility maximizer would be near-useless.
The side effects of the agent failing might still kill us.
For example, the failure could be something like “build a huge device which with probability 20% enables faster-than-light travel (which would allow colonizing more galaxies), and with probability 80% causes false vacuum collapse or otherwise destroys the entire universe”.
Or something on smaller scale, where the failure means blowing up the Earth, destroying all life, etc.