I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely
which you commented on.
It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function...
I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.
I question both of these premises. It could be like you or I, in the sense that it simply executes a sequence of actions with no coherent or constant driving utility function (even long-term goals are often inconsistent with each other), and even if you could demonstrate to it a utility function that met some extremely high standards, it would not be persuaded to adopt it. Attempting to build in such a utility function could be possible, but not necessarily natural at all; in fact I bet it would be unnatural and difficult.
I understand your rebuttal to “friendliness research is too premature to be useful” is “It is important enough to risk being premature”, but I hope you can agree that stronger arguments would put forward stronger evidence that the risk is not particularly large.
But let’s leave that aside. I’ll concede that it is possible that developing a strong friendliness theory before strong AI could be the only path to safe AI under some circumstances.
I still think that it is mistaken to try to ignore intermediate scenarios and focus only on that case. I wrote about this in a post before, How to Study AGIs safely which you commented on.
I doubt the first AGI will be like this, unless you count WBE as AGI. But if it will, it’s very bad news, since it would be very difficult to make it friendly. Such an AGI is akin to an alien species which evolved under conditions vastly different from ours: it will probably have very different values.