It’s less an issue with value drift* -- which does need to be solved for both goals and constraints—and more about the complexity of the system.
A well-designed goal hierarchy has an upper limit of complexity. Even if the full definition of human terminal values is too complicated to fit in a single human head, it can at least be extrapolated from things that fit within multiple human brains.
Even the best set of constraint heirachies do not share that benefit. Constraint systems in the real world are based around the complexity of our moral and ethical systems as contrasted with reality, and thus the cases can expand (literally) astronomically in relation to the total number of variations in the physical environment. Worse, these cases expand in the future and branch correspondingly—the classical example, as in The Metamorphisis of Prime Intellect or Friendship is Optimal is an AI built by someone that does not recognize some or all non-human life. A constraint-based AGI built under the average stated legal rules of the 1950s would think nothing about tweaking every person’s sexual orientation into heterosexuality, because the lack of such a constraint was obvious at that time and the goal system might well be built with such purposes as an incidental part of the goal, and you don’t need to explore the underlying ethical assumptions to code or not code that constraint.
Worse, a sufficiently powerful self-optimizer will expand into situations outside of environments the human brain could guess, or could possibly fit into the modern human head : does “A robot may not injure a human being or, through inaction, allow a human being to come to harm” prohibit or allow Zygraxis-based treatment? You or I—or anyone else with less than 10^18 working memory—can’t even imagine what that is, but it’s a heck of an ethical problem in our nondescript spacefuture! There’s a reason Asimov’s Three Laws stories tended to be about the constraints failing or acting unpredictably.
You also run into similar problems as in AI-Boxing : if a superhuman intellect would value something that directly conflicts with our ethical systems, it’s very hard to be smarter than it when making rules.
There may still be some useful situations for constraints in FAI theory—see the Ethical Injunctions sequence—but they don’t really make things safe in a non-FAI-complete setting.
Although some problems with value drift are related to the complexity of the system: you’re more likely to notice drift in one variable out of fifty than one variable in ten thousand. I don’t think unit tests are a good solution to Lob’s problem, though.
EDIT: You can limit the complexity of constraints by making them very broad, but then you end up with a genie that is either not very powerful or not very intelligent, or dangerous. See Problem 6 in Dreams of Friendliness
A well-designed goal hierarchy has an upper limit of complexity.
Why is that (other than the trivial “well-designed” == “upper limit of complexity”)?
Even the best set of constraint heirachies do not share that benefit.
I don’t understand this. Any given set of constraint hierarchies is given, it doesn’t have a limit. Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can’t guarantee an upper complexity limit? But that seems to be true for goals as well. We have to be careful about using words like “well-designed” or “arbitrary” here.
Constraint systems in the real world are based around the complexity of our moral and ethical systems
Not necessarily. I should make myself more clear: I am not trying to constrain an AI into being friendly, I’m trying to constrain it into being safe (that is, safer or “sufficiently safe” for certain values of “sufficiently”).
Consider, for example, a constrain of “do not affect more that 10 atoms in an hour”.
Worse, a sufficiently powerful self-optimizer will expand into situations that are outside of environments the human brain did not guess, or could not possibly fit into the modern human head
True, but insofar as we’re talking about practical research and practical solutions, I’d take imperfect but existing safety measures over pie-in-the-sky theoretical assurances that may or may not get realized. If you think the Singularity is coming, you’d better do whatever you can even if it doesn’t offer ironclad guarantees.
And it’s an “AND” branch, not “OR”. It seems to me you should be working both on making sure the goals are friendly AND on constraints to mitigate the consequences of… issues with CEV/friendliness.
Why is that (other than the trivial “well-designed” == “upper limit of complexity”)?
Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can’t guarantee an upper complexity limit?
Sorry, defining “well-designed” as meaning “human-friendly”. If any group of living human individuals have a goal hierarchy that is human-friendly, that means that the full set of human-friendly goals can fit within the total data structures of their brains. Indeed, the number of potential goals can not exceed the total data space of their brains.
((If you can’t have a group of humans with human-friendly goals, then… we’re kinda screwed.))
That’s not the case for constraint-based systems. In order to be human-safe, a constraint-based system must limit a vast majority of actions—human life and value is very fragile. In order to be human-safe /and/ make decisions at the same scale a human is capable of, the constraint-based system must also allow significant patterns within the disallowed larger cases. The United States legal system, for example, is the end result of two hundred and twenty years of folk trying to establish a workable constraint system for humans. They’re still running into special cases of fairly clearly defined stuff. The situations involved require tens of thousands of human brains to store them, plus countless more paper and bytes. And they still aren’t very good.
Consider, for example, a constrain of “do not affect more that 10 atoms in an hour”.
I’m not sure you could program such a thing without falling into, essentially, the AI-Box trap, and that’s not really a good bet. It’s also possible you can’t program that in any meaningful way at all while still letting the AI do anything.
((The more immediate problem is now you’ve made a useless AGI in a way that is more complex than an AGI, meaning someone else cribs your design and makes a 20 atom/hour version, then a 30 atom/hour version, and then sooner or later Jupiter is paperclips because someone forgot Avagadro’s Number.))
True, but insofar as we’re talking about practical research and practical solutions, I’d take imperfect but existing safety measures over pie-in-the-sky theoretical assurances that may or may not get realized. If you think the Singularity is coming, you’d better do whatever you can even if it doesn’t offer ironclad guarantees.
And it’s an “AND” branch, not “OR”. It seems to me you should be working both on making sure the goals are friendly AND on constraints to mitigate the consequences of… issues with CEV/friendliness.
Point. And there are benefits to FAI-theory in considering constraints. The other side of that trick is that there are downsides, as well, both in terms of opportunity cost, and because you’re going to see more people thinking that constraints alone can solve the problem.
The United States legal system, for example, is the end result of two hundred and twenty years of folk trying to establish a workable constraint system for humans.
Well, a lot of that was people attempting to manipulate the system for personal gain.
Well, yes, but the whole point of building AI is that it work for our gain, including deciding what that means and how to balance between persons. Basically if you include in “US legal system” all three branches of government, you can look at it as a very slow AI that uses brains as processor elements. Its friendliness is not quite demonstrated, but fortunately it’s not yet quite godlike.
It’s less an issue with value drift* -- which does need to be solved for both goals and constraints—and more about the complexity of the system.
A well-designed goal hierarchy has an upper limit of complexity. Even if the full definition of human terminal values is too complicated to fit in a single human head, it can at least be extrapolated from things that fit within multiple human brains.
Even the best set of constraint heirachies do not share that benefit. Constraint systems in the real world are based around the complexity of our moral and ethical systems as contrasted with reality, and thus the cases can expand (literally) astronomically in relation to the total number of variations in the physical environment. Worse, these cases expand in the future and branch correspondingly—the classical example, as in The Metamorphisis of Prime Intellect or Friendship is Optimal is an AI built by someone that does not recognize some or all non-human life. A constraint-based AGI built under the average stated legal rules of the 1950s would think nothing about tweaking every person’s sexual orientation into heterosexuality, because the lack of such a constraint was obvious at that time and the goal system might well be built with such purposes as an incidental part of the goal, and you don’t need to explore the underlying ethical assumptions to code or not code that constraint.
Worse, a sufficiently powerful self-optimizer will expand into situations outside of environments the human brain could guess, or could possibly fit into the modern human head : does “A robot may not injure a human being or, through inaction, allow a human being to come to harm” prohibit or allow Zygraxis-based treatment? You or I—or anyone else with less than 10^18 working memory—can’t even imagine what that is, but it’s a heck of an ethical problem in our nondescript spacefuture! There’s a reason Asimov’s Three Laws stories tended to be about the constraints failing or acting unpredictably.
You also run into similar problems as in AI-Boxing : if a superhuman intellect would value something that directly conflicts with our ethical systems, it’s very hard to be smarter than it when making rules.
The Hidden Complexity of Wishes is a pretty good summary of things.
There may still be some useful situations for constraints in FAI theory—see the Ethical Injunctions sequence—but they don’t really make things safe in a non-FAI-complete setting.
Although some problems with value drift are related to the complexity of the system: you’re more likely to notice drift in one variable out of fifty than one variable in ten thousand. I don’t think unit tests are a good solution to Lob’s problem, though.
EDIT: You can limit the complexity of constraints by making them very broad, but then you end up with a genie that is either not very powerful or not very intelligent, or dangerous. See Problem 6 in Dreams of Friendliness
Why is that (other than the trivial “well-designed” == “upper limit of complexity”)?
I don’t understand this. Any given set of constraint hierarchies is given, it doesn’t have a limit. Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can’t guarantee an upper complexity limit? But that seems to be true for goals as well. We have to be careful about using words like “well-designed” or “arbitrary” here.
Not necessarily. I should make myself more clear: I am not trying to constrain an AI into being friendly, I’m trying to constrain it into being safe (that is, safer or “sufficiently safe” for certain values of “sufficiently”).
Consider, for example, a constrain of “do not affect more that 10 atoms in an hour”.
True, but insofar as we’re talking about practical research and practical solutions, I’d take imperfect but existing safety measures over pie-in-the-sky theoretical assurances that may or may not get realized. If you think the Singularity is coming, you’d better do whatever you can even if it doesn’t offer ironclad guarantees.
And it’s an “AND” branch, not “OR”. It seems to me you should be working both on making sure the goals are friendly AND on constraints to mitigate the consequences of… issues with CEV/friendliness.
Sorry, defining “well-designed” as meaning “human-friendly”. If any group of living human individuals have a goal hierarchy that is human-friendly, that means that the full set of human-friendly goals can fit within the total data structures of their brains. Indeed, the number of potential goals can not exceed the total data space of their brains.
((If you can’t have a group of humans with human-friendly goals, then… we’re kinda screwed.))
That’s not the case for constraint-based systems. In order to be human-safe, a constraint-based system must limit a vast majority of actions—human life and value is very fragile. In order to be human-safe /and/ make decisions at the same scale a human is capable of, the constraint-based system must also allow significant patterns within the disallowed larger cases. The United States legal system, for example, is the end result of two hundred and twenty years of folk trying to establish a workable constraint system for humans. They’re still running into special cases of fairly clearly defined stuff. The situations involved require tens of thousands of human brains to store them, plus countless more paper and bytes. And they still aren’t very good.
I’m not sure you could program such a thing without falling into, essentially, the AI-Box trap, and that’s not really a good bet. It’s also possible you can’t program that in any meaningful way at all while still letting the AI do anything.
((The more immediate problem is now you’ve made a useless AGI in a way that is more complex than an AGI, meaning someone else cribs your design and makes a 20 atom/hour version, then a 30 atom/hour version, and then sooner or later Jupiter is paperclips because someone forgot Avagadro’s Number.))
Point. And there are benefits to FAI-theory in considering constraints. The other side of that trick is that there are downsides, as well, both in terms of opportunity cost, and because you’re going to see more people thinking that constraints alone can solve the problem.
Well, a lot of that was people attempting to manipulate the system for personal gain.
Well, yes, but the whole point of building AI is that it work for our gain, including deciding what that means and how to balance between persons. Basically if you include in “US legal system” all three branches of government, you can look at it as a very slow AI that uses brains as processor elements. Its friendliness is not quite demonstrated, but fortunately it’s not yet quite godlike.