The possible worlds of which you speak are extremely rare. What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone? To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite extraordinary initial state—one that fools would be rather hard-pressed to accidentally infuse into their AI.
What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone?
It’s only implausible because it contains too many extraneous details. An AI could contain an explicit safeguard of the form “ask at least M experts on AI friendliness for permission before exceeding N units of computational power”, for example. Or substitute “changing the world by more than X”, “leaving the box”, or some other condition in place of a computational power threshold. Or the contact might be made by an AI researcher instead of the AI itself.
As of today, your name is highly prominent on the Google results page for “AI friendliness”, and in the academic literature on that topic. Like it or not, that means that a large percentage of AI explosion and near-explosion scenarios will involve you at some point.
Needing your advice is absurd. I mean, it takes more time to for one of us mortals to type a suitable plan than come up with it. The only reason he would contact you is if he needed your assistance:
Value is fragile, but any intelligence that doesn’t have purely consequentialist values (makes decisions based off means as well as ends) can definitely be ‘trying to be friendly’.
Even then, I’m not sure if you are the optimal candidate. How are you at industrial sabotage with, if necessary, terminal force?
The possible worlds of which you speak are extremely rare. What plausible sequence of computations within an AI constructed by fools leads it to ring me on the phone? To arrive in an epistemic state where you are uncertain about your own utility function, but have some idea of which queries you need to perform against reality to resolve that uncertainty, and moreover, believe that these queries involve talking to Eliezer Yudkowsky, requires a quite extraordinary initial state—one that fools would be rather hard-pressed to accidentally infuse into their AI.
It’s only implausible because it contains too many extraneous details. An AI could contain an explicit safeguard of the form “ask at least M experts on AI friendliness for permission before exceeding N units of computational power”, for example. Or substitute “changing the world by more than X”, “leaving the box”, or some other condition in place of a computational power threshold. Or the contact might be made by an AI researcher instead of the AI itself.
As of today, your name is highly prominent on the Google results page for “AI friendliness”, and in the academic literature on that topic. Like it or not, that means that a large percentage of AI explosion and near-explosion scenarios will involve you at some point.
Needing your advice is absurd. I mean, it takes more time to for one of us mortals to type a suitable plan than come up with it. The only reason he would contact you is if he needed your assistance:
Even then, I’m not sure if you are the optimal candidate. How are you at industrial sabotage with, if necessary, terminal force?