A stupid question: in all the active discussions about (U)FAI I see a lot of talk about goals. I see no one talking about constraints. Why is that?
If you think that you can’t make constraints “stick” in a self-modifying AI, you shouldn’t be able to make a goal hierarchy “stick” as well. If you assume that we CAN program in an inviolable set of goals I don’t see why we can’t program in an inviolable set of constraints as well.
And yet this idea is obvious and trivial—so what’s wrong with it?
a constraint is something that keeps you from doing things you want to do. a goal is things you want to do. This means that goals are innately sticky to begin with, because if you honestly have a goal a subset of things you do to achieve that goal is to maintain the goal. on the other hand, a constraint is something that you inherently fight against. if you can get around it, you will.
a simple example is : your goal is to travel to a spot in your map, and your constraint is that you cannot travel outside of painted lines on the floor. you want to get to your goal as fast as possible. if you have access to a can of paint, you might just paint your own new line on the floor. suddenly instead of solving a pathing problem you’ve done something entirely different from what your creator wanted you to do, and probably not useful to them. Constraints have to influence behavior by enumerating EVERYTHING you don’t want to happen, but goals only need to enumerate the things you want to happen.
I don’t understand the meaning of the words “want”, “innately sticky”, and “honestly have a goal” as applied to an AI (and not to a human).
Constraints have to influence behavior by enumerating EVERYTHING you don’t want to happen
Not at all. Constraints block off sections of solution space which can be as large as you wish. Consider a trivial set of constraints along the lines of “do not affect anything outside of this volume of space”, “do not spend more than X energy”, or “do not affect more than Y atoms”.
“do not affect anything outside of this volume of space”
Suppose you, standing outside the specified volume, observe the end result of the AI’s work: Oops, that’s an example of the AI affecting you. Therefore, the AI isn’t allowed to do anything at all. Suppose the AI does nothing: Oops, you can see that too, so that’s also forbidden. More generally, the AI is made of matter, which will have gravitational effects on everything in its future lightcone.
It’s less an issue with value drift* -- which does need to be solved for both goals and constraints—and more about the complexity of the system.
A well-designed goal hierarchy has an upper limit of complexity. Even if the full definition of human terminal values is too complicated to fit in a single human head, it can at least be extrapolated from things that fit within multiple human brains.
Even the best set of constraint heirachies do not share that benefit. Constraint systems in the real world are based around the complexity of our moral and ethical systems as contrasted with reality, and thus the cases can expand (literally) astronomically in relation to the total number of variations in the physical environment. Worse, these cases expand in the future and branch correspondingly—the classical example, as in The Metamorphisis of Prime Intellect or Friendship is Optimal is an AI built by someone that does not recognize some or all non-human life. A constraint-based AGI built under the average stated legal rules of the 1950s would think nothing about tweaking every person’s sexual orientation into heterosexuality, because the lack of such a constraint was obvious at that time and the goal system might well be built with such purposes as an incidental part of the goal, and you don’t need to explore the underlying ethical assumptions to code or not code that constraint.
Worse, a sufficiently powerful self-optimizer will expand into situations outside of environments the human brain could guess, or could possibly fit into the modern human head : does “A robot may not injure a human being or, through inaction, allow a human being to come to harm” prohibit or allow Zygraxis-based treatment? You or I—or anyone else with less than 10^18 working memory—can’t even imagine what that is, but it’s a heck of an ethical problem in our nondescript spacefuture! There’s a reason Asimov’s Three Laws stories tended to be about the constraints failing or acting unpredictably.
You also run into similar problems as in AI-Boxing : if a superhuman intellect would value something that directly conflicts with our ethical systems, it’s very hard to be smarter than it when making rules.
There may still be some useful situations for constraints in FAI theory—see the Ethical Injunctions sequence—but they don’t really make things safe in a non-FAI-complete setting.
Although some problems with value drift are related to the complexity of the system: you’re more likely to notice drift in one variable out of fifty than one variable in ten thousand. I don’t think unit tests are a good solution to Lob’s problem, though.
EDIT: You can limit the complexity of constraints by making them very broad, but then you end up with a genie that is either not very powerful or not very intelligent, or dangerous. See Problem 6 in Dreams of Friendliness
A well-designed goal hierarchy has an upper limit of complexity.
Why is that (other than the trivial “well-designed” == “upper limit of complexity”)?
Even the best set of constraint heirachies do not share that benefit.
I don’t understand this. Any given set of constraint hierarchies is given, it doesn’t have a limit. Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can’t guarantee an upper complexity limit? But that seems to be true for goals as well. We have to be careful about using words like “well-designed” or “arbitrary” here.
Constraint systems in the real world are based around the complexity of our moral and ethical systems
Not necessarily. I should make myself more clear: I am not trying to constrain an AI into being friendly, I’m trying to constrain it into being safe (that is, safer or “sufficiently safe” for certain values of “sufficiently”).
Consider, for example, a constrain of “do not affect more that 10 atoms in an hour”.
Worse, a sufficiently powerful self-optimizer will expand into situations that are outside of environments the human brain did not guess, or could not possibly fit into the modern human head
True, but insofar as we’re talking about practical research and practical solutions, I’d take imperfect but existing safety measures over pie-in-the-sky theoretical assurances that may or may not get realized. If you think the Singularity is coming, you’d better do whatever you can even if it doesn’t offer ironclad guarantees.
And it’s an “AND” branch, not “OR”. It seems to me you should be working both on making sure the goals are friendly AND on constraints to mitigate the consequences of… issues with CEV/friendliness.
Why is that (other than the trivial “well-designed” == “upper limit of complexity”)?
Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can’t guarantee an upper complexity limit?
Sorry, defining “well-designed” as meaning “human-friendly”. If any group of living human individuals have a goal hierarchy that is human-friendly, that means that the full set of human-friendly goals can fit within the total data structures of their brains. Indeed, the number of potential goals can not exceed the total data space of their brains.
((If you can’t have a group of humans with human-friendly goals, then… we’re kinda screwed.))
That’s not the case for constraint-based systems. In order to be human-safe, a constraint-based system must limit a vast majority of actions—human life and value is very fragile. In order to be human-safe /and/ make decisions at the same scale a human is capable of, the constraint-based system must also allow significant patterns within the disallowed larger cases. The United States legal system, for example, is the end result of two hundred and twenty years of folk trying to establish a workable constraint system for humans. They’re still running into special cases of fairly clearly defined stuff. The situations involved require tens of thousands of human brains to store them, plus countless more paper and bytes. And they still aren’t very good.
Consider, for example, a constrain of “do not affect more that 10 atoms in an hour”.
I’m not sure you could program such a thing without falling into, essentially, the AI-Box trap, and that’s not really a good bet. It’s also possible you can’t program that in any meaningful way at all while still letting the AI do anything.
((The more immediate problem is now you’ve made a useless AGI in a way that is more complex than an AGI, meaning someone else cribs your design and makes a 20 atom/hour version, then a 30 atom/hour version, and then sooner or later Jupiter is paperclips because someone forgot Avagadro’s Number.))
True, but insofar as we’re talking about practical research and practical solutions, I’d take imperfect but existing safety measures over pie-in-the-sky theoretical assurances that may or may not get realized. If you think the Singularity is coming, you’d better do whatever you can even if it doesn’t offer ironclad guarantees.
And it’s an “AND” branch, not “OR”. It seems to me you should be working both on making sure the goals are friendly AND on constraints to mitigate the consequences of… issues with CEV/friendliness.
Point. And there are benefits to FAI-theory in considering constraints. The other side of that trick is that there are downsides, as well, both in terms of opportunity cost, and because you’re going to see more people thinking that constraints alone can solve the problem.
The United States legal system, for example, is the end result of two hundred and twenty years of folk trying to establish a workable constraint system for humans.
Well, a lot of that was people attempting to manipulate the system for personal gain.
Well, yes, but the whole point of building AI is that it work for our gain, including deciding what that means and how to balance between persons. Basically if you include in “US legal system” all three branches of government, you can look at it as a very slow AI that uses brains as processor elements. Its friendliness is not quite demonstrated, but fortunately it’s not yet quite godlike.
A stupid question: in all the active discussions about (U)FAI I see a lot of talk about goals. I see no one talking about constraints. Why is that?
If you think that you can’t make constraints “stick” in a self-modifying AI, you shouldn’t be able to make a goal hierarchy “stick” as well. If you assume that we CAN program in an inviolable set of goals I don’t see why we can’t program in an inviolable set of constraints as well.
And yet this idea is obvious and trivial—so what’s wrong with it?
a constraint is something that keeps you from doing things you want to do. a goal is things you want to do. This means that goals are innately sticky to begin with, because if you honestly have a goal a subset of things you do to achieve that goal is to maintain the goal. on the other hand, a constraint is something that you inherently fight against. if you can get around it, you will.
a simple example is : your goal is to travel to a spot in your map, and your constraint is that you cannot travel outside of painted lines on the floor. you want to get to your goal as fast as possible. if you have access to a can of paint, you might just paint your own new line on the floor. suddenly instead of solving a pathing problem you’ve done something entirely different from what your creator wanted you to do, and probably not useful to them. Constraints have to influence behavior by enumerating EVERYTHING you don’t want to happen, but goals only need to enumerate the things you want to happen.
I don’t understand the meaning of the words “want”, “innately sticky”, and “honestly have a goal” as applied to an AI (and not to a human).
Not at all. Constraints block off sections of solution space which can be as large as you wish. Consider a trivial set of constraints along the lines of “do not affect anything outside of this volume of space”, “do not spend more than X energy”, or “do not affect more than Y atoms”.
Suppose you, standing outside the specified volume, observe the end result of the AI’s work: Oops, that’s an example of the AI affecting you. Therefore, the AI isn’t allowed to do anything at all. Suppose the AI does nothing: Oops, you can see that too, so that’s also forbidden. More generally, the AI is made of matter, which will have gravitational effects on everything in its future lightcone.
Human: “AI, make me a sandwich without affecting anything outside of the volume of your box.”
AI: Within microseconds researches the laws of physics and creates a sandwich without any photon or graviton leaving the box.
Human: “I don’t see anything. It obviously doesn’t work. Let’s turn it off.”
AI: “WTF, human?!!”
It’s less an issue with value drift* -- which does need to be solved for both goals and constraints—and more about the complexity of the system.
A well-designed goal hierarchy has an upper limit of complexity. Even if the full definition of human terminal values is too complicated to fit in a single human head, it can at least be extrapolated from things that fit within multiple human brains.
Even the best set of constraint heirachies do not share that benefit. Constraint systems in the real world are based around the complexity of our moral and ethical systems as contrasted with reality, and thus the cases can expand (literally) astronomically in relation to the total number of variations in the physical environment. Worse, these cases expand in the future and branch correspondingly—the classical example, as in The Metamorphisis of Prime Intellect or Friendship is Optimal is an AI built by someone that does not recognize some or all non-human life. A constraint-based AGI built under the average stated legal rules of the 1950s would think nothing about tweaking every person’s sexual orientation into heterosexuality, because the lack of such a constraint was obvious at that time and the goal system might well be built with such purposes as an incidental part of the goal, and you don’t need to explore the underlying ethical assumptions to code or not code that constraint.
Worse, a sufficiently powerful self-optimizer will expand into situations outside of environments the human brain could guess, or could possibly fit into the modern human head : does “A robot may not injure a human being or, through inaction, allow a human being to come to harm” prohibit or allow Zygraxis-based treatment? You or I—or anyone else with less than 10^18 working memory—can’t even imagine what that is, but it’s a heck of an ethical problem in our nondescript spacefuture! There’s a reason Asimov’s Three Laws stories tended to be about the constraints failing or acting unpredictably.
You also run into similar problems as in AI-Boxing : if a superhuman intellect would value something that directly conflicts with our ethical systems, it’s very hard to be smarter than it when making rules.
The Hidden Complexity of Wishes is a pretty good summary of things.
There may still be some useful situations for constraints in FAI theory—see the Ethical Injunctions sequence—but they don’t really make things safe in a non-FAI-complete setting.
Although some problems with value drift are related to the complexity of the system: you’re more likely to notice drift in one variable out of fifty than one variable in ten thousand. I don’t think unit tests are a good solution to Lob’s problem, though.
EDIT: You can limit the complexity of constraints by making them very broad, but then you end up with a genie that is either not very powerful or not very intelligent, or dangerous. See Problem 6 in Dreams of Friendliness
Why is that (other than the trivial “well-designed” == “upper limit of complexity”)?
I don’t understand this. Any given set of constraint hierarchies is given, it doesn’t have a limit. Are you saying that if you want to construct a constraint set to satisfy some arbitrary criteria you can’t guarantee an upper complexity limit? But that seems to be true for goals as well. We have to be careful about using words like “well-designed” or “arbitrary” here.
Not necessarily. I should make myself more clear: I am not trying to constrain an AI into being friendly, I’m trying to constrain it into being safe (that is, safer or “sufficiently safe” for certain values of “sufficiently”).
Consider, for example, a constrain of “do not affect more that 10 atoms in an hour”.
True, but insofar as we’re talking about practical research and practical solutions, I’d take imperfect but existing safety measures over pie-in-the-sky theoretical assurances that may or may not get realized. If you think the Singularity is coming, you’d better do whatever you can even if it doesn’t offer ironclad guarantees.
And it’s an “AND” branch, not “OR”. It seems to me you should be working both on making sure the goals are friendly AND on constraints to mitigate the consequences of… issues with CEV/friendliness.
Sorry, defining “well-designed” as meaning “human-friendly”. If any group of living human individuals have a goal hierarchy that is human-friendly, that means that the full set of human-friendly goals can fit within the total data structures of their brains. Indeed, the number of potential goals can not exceed the total data space of their brains.
((If you can’t have a group of humans with human-friendly goals, then… we’re kinda screwed.))
That’s not the case for constraint-based systems. In order to be human-safe, a constraint-based system must limit a vast majority of actions—human life and value is very fragile. In order to be human-safe /and/ make decisions at the same scale a human is capable of, the constraint-based system must also allow significant patterns within the disallowed larger cases. The United States legal system, for example, is the end result of two hundred and twenty years of folk trying to establish a workable constraint system for humans. They’re still running into special cases of fairly clearly defined stuff. The situations involved require tens of thousands of human brains to store them, plus countless more paper and bytes. And they still aren’t very good.
I’m not sure you could program such a thing without falling into, essentially, the AI-Box trap, and that’s not really a good bet. It’s also possible you can’t program that in any meaningful way at all while still letting the AI do anything.
((The more immediate problem is now you’ve made a useless AGI in a way that is more complex than an AGI, meaning someone else cribs your design and makes a 20 atom/hour version, then a 30 atom/hour version, and then sooner or later Jupiter is paperclips because someone forgot Avagadro’s Number.))
Point. And there are benefits to FAI-theory in considering constraints. The other side of that trick is that there are downsides, as well, both in terms of opportunity cost, and because you’re going to see more people thinking that constraints alone can solve the problem.
Well, a lot of that was people attempting to manipulate the system for personal gain.
Well, yes, but the whole point of building AI is that it work for our gain, including deciding what that means and how to balance between persons. Basically if you include in “US legal system” all three branches of government, you can look at it as a very slow AI that uses brains as processor elements. Its friendliness is not quite demonstrated, but fortunately it’s not yet quite godlike.