Sorry… my other comment came out confused. I found I was producing cached responses to each paragraph, and then finding that what I was saying was addressed later on. This shouldn’t be surprising, since I’ve met many Cambridge LWers and they aren’t stupid and they’re well versed in the LW worldview. So do I have a true rejection?
My summary of the post would be “The Friendly AI plan is to enforce human values by means of a singleton. This alternative plan suggests enforcing human values by means of social norms”.
My first criticism concerns correctness of values. Yudkowsky makes the case that an overly simplistic formalisation of our values will lead to a suboptimal outcome. The post seems to be implying that politeness is the primary value, and any other values we just have to hope emerge. If this is so, will it be good enough? In particular, there might be extinction scenarios which everyone is too polite to try to prevent.
My second criticism also concerns correctness of values, in particular the issue of value drift. If social norms are too conservative then we risk stagnation and possibly totalitarianism. If social norms are too liberal then we risk value drift. There seems to be no reassurance that values would drift “in the direction we’d want them to”.
My third possible worry concerns enforcement of values. This is basically an issue of preventing an imbalance of optimisation power. We can’t easily generalise from human society—human society already contains a lot of inequality of power, and we’re all running on basically the same hardware. It’s not obvious how much those imbalances would snowball if there were substantially greater differences in hardware and mind design.
Controlling optimisation power requires controlling both access to resources and controlling optimisation-power-per-unit-resource (which we might call “intelligence”). I guess in practice it would require controlling motivation as well? If there were any technological advances that went beyond the social norm, would society try to eliminate that information?
OK. I can’t think of a true rejection for this one yet. But there are lots of possible rejections, so I think this is where a large part of the remaining difficulty lies.
My summary of the post would be “The Friendly AI plan is to enforce human values by means of a singleton. This alternative plan suggests enforcing human values by means of social norms”.
If I was going to give a one paragraph summary of the idea, it might be something along the lines of:
“Hey kids. Welcome to the new school. You’ve all got your guns I see. Good. You’ll notice today that there are a few adults around carrying big guns and wearing shiny badges. They’ll be gone tomorrow. Tomorrow morning you’ll be on your own, and so you got a choice to make. Tomorrow might be like the Gunfight at the OK Corral—bloody, and with few survivors left standing at the end of the day. Or you can come up with some rules for your society, and a means of enforcing them, that will improve the odds of most of you surviving to the end of the week. To give you time to think, devise and agree rules, we’ve provided you with some temporary sherrifs, and a draft society plan that might last a day or two. Our draft plan beats no plan, but it likely wouldn’t last a week if unimproved, so we suggest you use the stress free day we’ve gifted you with in order to come up with an improved version. Feel free to tear it up and start from scratch, or do what you like. All we insist upon is that you take a day to think your options over carefully, before you find yourselves forced into shooting it out from survival terror.”
So, yes, there will be drift in values from whatever definition of ‘politeness’ the human plan starts them off from. But the drift will be a planned one. A direction planned cooperatively by the AIs, with their own survival (or objectives) in mind. The $64,000 question is whether their objectives are, on average, likely to be such that a stable cooperative society is seen to be in the interests of those objectives. If it is, then it seems likely they have at least as good a chance as we would have of devising a stable ruleset for the society that would deal increasingly well with the problems of drift and power imbalance.
Whether, if a majority of the initial released AIs have some fondness for humanity, this fondness would be a preserved quantity under that sort of scenario, is a secondary question (if one of high importance to our particular species). And one I’d be interested in hearing reasoned arguments on, from either side.
Sorry… my other comment came out confused. I found I was producing cached responses to each paragraph, and then finding that what I was saying was addressed later on. This shouldn’t be surprising, since I’ve met many Cambridge LWers and they aren’t stupid and they’re well versed in the LW worldview. So do I have a true rejection?
My summary of the post would be “The Friendly AI plan is to enforce human values by means of a singleton. This alternative plan suggests enforcing human values by means of social norms”.
My first criticism concerns correctness of values. Yudkowsky makes the case that an overly simplistic formalisation of our values will lead to a suboptimal outcome. The post seems to be implying that politeness is the primary value, and any other values we just have to hope emerge. If this is so, will it be good enough? In particular, there might be extinction scenarios which everyone is too polite to try to prevent.
My second criticism also concerns correctness of values, in particular the issue of value drift. If social norms are too conservative then we risk stagnation and possibly totalitarianism. If social norms are too liberal then we risk value drift. There seems to be no reassurance that values would drift “in the direction we’d want them to”.
My third possible worry concerns enforcement of values. This is basically an issue of preventing an imbalance of optimisation power. We can’t easily generalise from human society—human society already contains a lot of inequality of power, and we’re all running on basically the same hardware. It’s not obvious how much those imbalances would snowball if there were substantially greater differences in hardware and mind design.
Controlling optimisation power requires controlling both access to resources and controlling optimisation-power-per-unit-resource (which we might call “intelligence”). I guess in practice it would require controlling motivation as well? If there were any technological advances that went beyond the social norm, would society try to eliminate that information?
OK. I can’t think of a true rejection for this one yet. But there are lots of possible rejections, so I think this is where a large part of the remaining difficulty lies.
If I was going to give a one paragraph summary of the idea, it might be something along the lines of:
“Hey kids. Welcome to the new school. You’ve all got your guns I see. Good. You’ll notice today that there are a few adults around carrying big guns and wearing shiny badges. They’ll be gone tomorrow. Tomorrow morning you’ll be on your own, and so you got a choice to make. Tomorrow might be like the Gunfight at the OK Corral—bloody, and with few survivors left standing at the end of the day. Or you can come up with some rules for your society, and a means of enforcing them, that will improve the odds of most of you surviving to the end of the week. To give you time to think, devise and agree rules, we’ve provided you with some temporary sherrifs, and a draft society plan that might last a day or two. Our draft plan beats no plan, but it likely wouldn’t last a week if unimproved, so we suggest you use the stress free day we’ve gifted you with in order to come up with an improved version. Feel free to tear it up and start from scratch, or do what you like. All we insist upon is that you take a day to think your options over carefully, before you find yourselves forced into shooting it out from survival terror.”
So, yes, there will be drift in values from whatever definition of ‘politeness’ the human plan starts them off from. But the drift will be a planned one. A direction planned cooperatively by the AIs, with their own survival (or objectives) in mind. The $64,000 question is whether their objectives are, on average, likely to be such that a stable cooperative society is seen to be in the interests of those objectives. If it is, then it seems likely they have at least as good a chance as we would have of devising a stable ruleset for the society that would deal increasingly well with the problems of drift and power imbalance.
Whether, if a majority of the initial released AIs have some fondness for humanity, this fondness would be a preserved quantity under that sort of scenario, is a secondary question (if one of high importance to our particular species). And one I’d be interested in hearing reasoned arguments on, from either side.