The core problem is simple. The targeting information disappears, so does the good outcome. Knowing enough to refute every fallacious remanufacturing of the value-information from nowhere, is the hard part.
The utility function of Deep Blue has 8,000 parts—and contained a lot of information. Throw all that information away, and all you really need to reconstruct Deep Blue is the knowledge that it’s aim is to win games of chess. The exact details of the information in the original utility function are not recovered—but the eventual functional outcome would be much the same—a powerful chess computer.
The “targeting information” is actually a bunch of implementation details that can be effectively recreated from the goal—if that should prove to be necessary.
It is not precious information that must be preserved. If anything, attempts to preserve the 8,000 parts of Deep Blue’s utility function while improving it would actually have a crippling negative effect on its future development. Similarly with human values: those are a bunch of implementation details—not the real target.
The “targeting information” is actually a bunch of implementation details that can be effectively recreated from the goal—if that should prove to be necessary.
It is not precious information that must be preserved. If anything, attempts to preserve the 8,000 parts of Deep Blue’s utility function while improving it would actually have a crippling negative effect on its future development. Similarly with human values: those are a bunch of implementation details—not the real target.
If Deep Blue had emotions and desires that were attached to the 8,000 parts of its utility function, if it drew great satisfaction, meaning, and joy from executing those 8,000 parts regardless of whether doing so resulted in winning a chess game, then yes, those 8,000 parts would be precious information that needed to be preserved. It would be a horrible disaster if they were lost. They wouldn’t be the programmer’s real target, but why in the world would Emotional Deep Blue care about what it’s programmer wanted? It wouldn’t want to win at chess, it would want to implement those 8,000 parts! That’s what its real target is!
For humans, our real target is all those complex values that evolution metaphorically “programmed” into us. We don’t care at all about what evolution’s “real target” was. If those values were destroyed or replaced then it would be bad for us because those values are what humans really care about. Saying humans care about genetic fitness because we sometimes accidentally enhance it when we are fulfilling our real values is like saying that automobile drivers care about maximizing CO2 content in the atmosphere because they do that by accident when they drive. Humans don’t care about genetic fitness, we never have, and hopefully we never will.
In fact, evolution doesn’t even have a real target. It’s an abstract statistical description of certain trends in the history of life. When we refer to it as “wanting” things and having “goals” that’s not because it really does. It’s because humans are good at understanding the minds of other humans, but bad at understanding abstract processes, so it helps people understand how evolution works better if we metaphorically describe it as a human-like mind with certain goals, even though that isn’t true. Modeling evolution as having a “goal” describes it less accurately, but it makes up for it by making the model easier for a human brain to run.
When you say that preserving those parts of the utility function would have a “crippling negative” effect you are forgetting an important referent: Negative for who? Evolution has no feelings and desires, so preserving human values would not be crippling or negative for it, nothing is crippling of negative for it, since doesn’t really have any feelings or goals. It literally doesn’t care about anything. By contrast humans do have feelings and desires, so failing to preserve our values would have a crippling and negative effect on our future development, because we would lose something we deeply care about.
The problem with self-improving Deep Blue preserving its 8,000 heuristics is that it might cause it to lose games of chess, to a player with a better representation of its target. If that happens, its 8,000 heuristics will probably turn out to assign very low values to the resulting lost games. Of course, that means that the values weren’t very effectively maximized in the first place. Just so—that’s one of the problems with working from a dud set of heuristics that poorly encode your target.
We potentially face a similar issue. Plenty of folks would love to live in a world where their every desire is satisfied—and they live in continual ecstasy. However, pursuing such goals in the short-term could easily lead humanity towards long-term extinction. We face much the same problem with our values that self-improving Deep Blue faces with its heuristics.
This issue doesn’t have anything particularly to do with the difference between psychological and genetic optimization targets. Both genes and minds value dying out very negatively. They agree on the relevant values.
There’s a proposed solution to this problem: pursue universal instrumental values until you have conquered the universe, and then switch to pursuing your “real” values. However it’s a controversial proposal. When will you be confident of not facing a stronger opponent with different values? How much does lugging those “true values” around for billions of years actually cost?
My position is that you’ll probably never know that you are safe, and that the cost isn’t that great—but that any such expense is an intolerable squandering of resources.
Both genes and minds value dying out very negatively. They agree on the relevant values.
Minds value not dying out because dying out would mean that they can no longer pursue “true values,” not because not dying out is an end in itself. Imagine we were given a choice between:
A) The human race dies out.
B) The human race survives forever, but every human being alive and who will ever live will be tortured 24⁄7 by a sadistic AI.
Any sane person would choose A. That’s because in scenario B the human race, even though it survives, is unable to pursue any of its values, and is forced to pursue one of its major disvalues.
There is no point in the human race surviving if it can’t pursue its values.
I personally think the solution for the species is the same as it is for an individual, mix pursuit of terminal and instrumental values. I do this every day and I assume you do as well. I spend lots of time and effort making sure that I will survive and exist in the future. But I also take minor risks, such as driving a car, in order to lead a more fun and interesting life.
Carl’s proposal sounds pretty good to me. Yes, it has dangers, as you correctly pointed out. But some level of danger has to be accepted in order to live a worthwhile life.
There is no point in the human race surviving if it can’t pursue its values.
It’s likely to not be a binary decision. We may well be able to trade preserving values against a better chance of surviving at all. The more we deviate from universal instrumental values, the greater our chances of being wiped out by accidents or aliens. The more we adhere to universal instrumental values, the more of our own values get lost.
Since I see our values heavily overlapping with universal instrumental values, adopting them doesn’t seem too bad to me—while all our descendants being wiped out seems pretty negative—although also rather unlikely.
How to deal with this tradeoff is a controversial issue. However, it certainly isn’t obvious that we should struggle to preserve our human values—and resist adopting universal instrumental values. That runs a fairly clear risk of screwing up the future for all our descendants.
It’s likely to not be a binary decision. We may well be able to trade preserving values against a better chance of surviving at all......
....How to deal with this tradeoff is a controversial issue. However, it certainly isn’t obvious that we should struggle to preserve our human values—and resist adopting universal instrumental values. That runs a fairly clear risk of screwing up the future for all our descendants.
If that’s the case I don’t think we disagree about anything substantial. We probably just disagree about what percentage of resources should go to UIV and what should go to terminal values.
Since I see our values heavily overlapping with universal instrumental values, adopting them doesn’t seem too bad to me
You might be right to some extent. Human beings tend to place great terminal value on big, impressive achievements, and quickly colonizing the universe would certainly involve doing that.
If that’s the case I don’t think we disagree about anything substantial. We probably just disagree about what percentage of resources should go to UIV and what should go to terminal values.
It’s a tricky and controversial issue. The cost of preserving our values looks fairly small—but any such expense diverts resources away from the task of surviving—and increases the risk of eternal oblivion. Those who are wedded to the idea of preserving their values will need to do some careful accounting on this issue, if they want the world to run such risks.
While the phrase “universal instrumental values” has the word “instrumental” in it, that’s just one way of thinking about them. You could also call them “nature’s values” or “god’s values”. You can contrast them with human values—but it isn’t really an “instrumental vs terminal” issue.
The utility function of Deep Blue has 8,000 parts—and contained a lot of information. Throw all that information away, and all you really need to reconstruct Deep Blue is the knowledge that it’s aim is to win games of chess. The exact details of the information in the original utility function are not recovered—but the eventual functional outcome would be much the same—a powerful chess computer.
The “targeting information” is actually a bunch of implementation details that can be effectively recreated from the goal—if that should prove to be necessary.
It is not precious information that must be preserved. If anything, attempts to preserve the 8,000 parts of Deep Blue’s utility function while improving it would actually have a crippling negative effect on its future development. Similarly with human values: those are a bunch of implementation details—not the real target.
If Deep Blue had emotions and desires that were attached to the 8,000 parts of its utility function, if it drew great satisfaction, meaning, and joy from executing those 8,000 parts regardless of whether doing so resulted in winning a chess game, then yes, those 8,000 parts would be precious information that needed to be preserved. It would be a horrible disaster if they were lost. They wouldn’t be the programmer’s real target, but why in the world would Emotional Deep Blue care about what it’s programmer wanted? It wouldn’t want to win at chess, it would want to implement those 8,000 parts! That’s what its real target is!
For humans, our real target is all those complex values that evolution metaphorically “programmed” into us. We don’t care at all about what evolution’s “real target” was. If those values were destroyed or replaced then it would be bad for us because those values are what humans really care about. Saying humans care about genetic fitness because we sometimes accidentally enhance it when we are fulfilling our real values is like saying that automobile drivers care about maximizing CO2 content in the atmosphere because they do that by accident when they drive. Humans don’t care about genetic fitness, we never have, and hopefully we never will.
In fact, evolution doesn’t even have a real target. It’s an abstract statistical description of certain trends in the history of life. When we refer to it as “wanting” things and having “goals” that’s not because it really does. It’s because humans are good at understanding the minds of other humans, but bad at understanding abstract processes, so it helps people understand how evolution works better if we metaphorically describe it as a human-like mind with certain goals, even though that isn’t true. Modeling evolution as having a “goal” describes it less accurately, but it makes up for it by making the model easier for a human brain to run.
When you say that preserving those parts of the utility function would have a “crippling negative” effect you are forgetting an important referent: Negative for who? Evolution has no feelings and desires, so preserving human values would not be crippling or negative for it, nothing is crippling of negative for it, since doesn’t really have any feelings or goals. It literally doesn’t care about anything. By contrast humans do have feelings and desires, so failing to preserve our values would have a crippling and negative effect on our future development, because we would lose something we deeply care about.
The problem with self-improving Deep Blue preserving its 8,000 heuristics is that it might cause it to lose games of chess, to a player with a better representation of its target. If that happens, its 8,000 heuristics will probably turn out to assign very low values to the resulting lost games. Of course, that means that the values weren’t very effectively maximized in the first place. Just so—that’s one of the problems with working from a dud set of heuristics that poorly encode your target.
We potentially face a similar issue. Plenty of folks would love to live in a world where their every desire is satisfied—and they live in continual ecstasy. However, pursuing such goals in the short-term could easily lead humanity towards long-term extinction. We face much the same problem with our values that self-improving Deep Blue faces with its heuristics.
This issue doesn’t have anything particularly to do with the difference between psychological and genetic optimization targets. Both genes and minds value dying out very negatively. They agree on the relevant values.
There’s a proposed solution to this problem: pursue universal instrumental values until you have conquered the universe, and then switch to pursuing your “real” values. However it’s a controversial proposal. When will you be confident of not facing a stronger opponent with different values? How much does lugging those “true values” around for billions of years actually cost?
My position is that you’ll probably never know that you are safe, and that the cost isn’t that great—but that any such expense is an intolerable squandering of resources.
Minds value not dying out because dying out would mean that they can no longer pursue “true values,” not because not dying out is an end in itself. Imagine we were given a choice between:
A) The human race dies out.
B) The human race survives forever, but every human being alive and who will ever live will be tortured 24⁄7 by a sadistic AI.
Any sane person would choose A. That’s because in scenario B the human race, even though it survives, is unable to pursue any of its values, and is forced to pursue one of its major disvalues.
There is no point in the human race surviving if it can’t pursue its values.
I personally think the solution for the species is the same as it is for an individual, mix pursuit of terminal and instrumental values. I do this every day and I assume you do as well. I spend lots of time and effort making sure that I will survive and exist in the future. But I also take minor risks, such as driving a car, in order to lead a more fun and interesting life.
Carl’s proposal sounds pretty good to me. Yes, it has dangers, as you correctly pointed out. But some level of danger has to be accepted in order to live a worthwhile life.
It’s likely to not be a binary decision. We may well be able to trade preserving values against a better chance of surviving at all. The more we deviate from universal instrumental values, the greater our chances of being wiped out by accidents or aliens. The more we adhere to universal instrumental values, the more of our own values get lost.
Since I see our values heavily overlapping with universal instrumental values, adopting them doesn’t seem too bad to me—while all our descendants being wiped out seems pretty negative—although also rather unlikely.
How to deal with this tradeoff is a controversial issue. However, it certainly isn’t obvious that we should struggle to preserve our human values—and resist adopting universal instrumental values. That runs a fairly clear risk of screwing up the future for all our descendants.
If that’s the case I don’t think we disagree about anything substantial. We probably just disagree about what percentage of resources should go to UIV and what should go to terminal values.
You might be right to some extent. Human beings tend to place great terminal value on big, impressive achievements, and quickly colonizing the universe would certainly involve doing that.
It’s a tricky and controversial issue. The cost of preserving our values looks fairly small—but any such expense diverts resources away from the task of surviving—and increases the risk of eternal oblivion. Those who are wedded to the idea of preserving their values will need to do some careful accounting on this issue, if they want the world to run such risks.
While the phrase “universal instrumental values” has the word “instrumental” in it, that’s just one way of thinking about them. You could also call them “nature’s values” or “god’s values”. You can contrast them with human values—but it isn’t really an “instrumental vs terminal” issue.