Let’s assume such an AI could be created perfectly.
Wouldn’t there be a danger of freezing human values forever to the values of the society which created it?
Imagine somehow the Victorian people (using steampunk or whatever) managed to build such an AI, and that AI would forever enforce their values. Would you be happy with every single value it enforced?
So in this formulation, human values are explicitly considered to be dynamic and in a constant change as people accumulate new experiences and have their environment change. Say that the Victorians invent a steampunk version of the Internet; that’s going to cause them to have new kinds of experiences, which will cause changes in their values.
Both individuals and societies also have lots of different value conflicts that they will want to resolve; see e.g. the last three paragraphs of this comment. Resolving those conflicts and helping people find the most rewarding things will naturally change their values.
Now there is still a bit of a risk of value lock-in, in that the AI is postulated to use the society’s existing values as the rule that determines what kinds of adjustments to values are acceptable. But I think that there’s an inevitable tradeoff, in that we both want to allow for value evolution, and to make sure that we don’t end up in a future that would contain nothing of value (as judged by us current-day humans). Unless we are prepared to just let anything happen (in which case why bother with Friendly AI stuff in the first place?), we need to have our existing values guide some of the development process.
I don’t want to speak for the original author, but I imagine that presumably the AI would take into account that the Victorian society’s culture was changing based on its interactions with the AI, and that the AI would try to safeguard the new, updated values...until such a time as those new values became obsolete as well.
In other words, it sounds like under this scheme the AI’s conception of human values would not be hardcoded. Instead, it would observe our affect to see what sorts of new activities had become terminal in their own right that made us intrinsically happy to participate in, and the AI would adapt to this change in human culture to facilitate the achievement of those new activities.
That said, I’m still unsure about how one could guarantee that the AI could not hack its own “human affect detector” to make it very easy for itself by forcing smiles on everyone’s face under torture and defining torture as the preferred human activity.
That said, I’m still unsure about how one could guarantee that the AI could not hack its own “human affect detector” to make it very easy for itself by forcing smiles on everyone’s face under torture and defining torture as the preferred human activity.
That’s a valid question, but note that it’s asking a different question than the one that this model is addressing. (This model asks “what are human values and what do we want the AI to do with them”, your question here is “how can we prevent the AI from wireheading itself in a way that stops it doing the things that we want it to do”. “What” versus “how”.)
Let’s assume such an AI could be created perfectly.
Wouldn’t there be a danger of freezing human values forever to the values of the society which created it?
Imagine somehow the Victorian people (using steampunk or whatever) managed to build such an AI, and that AI would forever enforce their values. Would you be happy with every single value it enforced?
So in this formulation, human values are explicitly considered to be dynamic and in a constant change as people accumulate new experiences and have their environment change. Say that the Victorians invent a steampunk version of the Internet; that’s going to cause them to have new kinds of experiences, which will cause changes in their values.
Both individuals and societies also have lots of different value conflicts that they will want to resolve; see e.g. the last three paragraphs of this comment. Resolving those conflicts and helping people find the most rewarding things will naturally change their values.
Now there is still a bit of a risk of value lock-in, in that the AI is postulated to use the society’s existing values as the rule that determines what kinds of adjustments to values are acceptable. But I think that there’s an inevitable tradeoff, in that we both want to allow for value evolution, and to make sure that we don’t end up in a future that would contain nothing of value (as judged by us current-day humans). Unless we are prepared to just let anything happen (in which case why bother with Friendly AI stuff in the first place?), we need to have our existing values guide some of the development process.
I don’t want to speak for the original author, but I imagine that presumably the AI would take into account that the Victorian society’s culture was changing based on its interactions with the AI, and that the AI would try to safeguard the new, updated values...until such a time as those new values became obsolete as well.
In other words, it sounds like under this scheme the AI’s conception of human values would not be hardcoded. Instead, it would observe our affect to see what sorts of new activities had become terminal in their own right that made us intrinsically happy to participate in, and the AI would adapt to this change in human culture to facilitate the achievement of those new activities.
That said, I’m still unsure about how one could guarantee that the AI could not hack its own “human affect detector” to make it very easy for itself by forcing smiles on everyone’s face under torture and defining torture as the preferred human activity.
I endorse this comment.
That’s a valid question, but note that it’s asking a different question than the one that this model is addressing. (This model asks “what are human values and what do we want the AI to do with them”, your question here is “how can we prevent the AI from wireheading itself in a way that stops it doing the things that we want it to do”. “What” versus “how”.)