Who is your target audience? Can you pretend to be the actual person you are trying to convince and do your absolute best to demolish the arguments presented in this paper? (You can find their arguments in their publications and apply them to your paper.) And no counter-objections until you finished writing what essentially is a referee report. If you need some extra motivation, pretend that you are being paid $100 for each argument that convinces the rest of the audience and $1000 for each argument that convinces the paper author. When done, post the referee report here, and people will tell you whether you did a good job.
Can you pretend to be the actual person you are trying to convince and do your absolute best to demolish the arguments presented in this paper?
No, I cannot. I’ve read the various papers, and they all orbit around an implicit and often unstated moral realism. I’ve also debated philosophers on this, and the same issue rears its head—I can counter their arguments, but their opinions don’t shift. There is an implicit moral realism that does not make any sense to me, and the more I analyse it, the less sense it makes, and the less convincing it becomes. Every time a philosopher has encouraged me to read a particular work, it’s made me find their moral realism less likely, because the arguments are always weak.
I can’t really put myself in their shoes to successfully argue their position (which I could do with theism, incidentally). I’ve tried and failed.
If someone can help we with this, I’d be most grateful. Why does “for reasons we don’t know, any being will come to share and follow specific moral principles (but we don’t know what they are)”, rise to seem plausible?
Just how diverse is human motivation? Should we discount even sophisticated versions of psychological hedonism? Undoubtedly, the “pleasure principle” is simplistic as it stands. But one good reason not to try heroin, for example, is precisely that the reward architecture of our opioid pathways is so similar. Previously diverse life-projects of first-time heroin users are at risk of converging on a common outcome. So more broadly, let’s consider the class of life-supporting Hubble volumes where sentient biological robots acquire the capacity to rewrite their genetic source code and gain mastery of their own reward circuitry. May we predict orthogonality or convergence? Certainly, there are strong arguments why such intelligences won’t all become the functional equivalent of heroin addicts or wireheads or Nozick Experience Machine VR-heads (etc). One such argument is the nature of selection pressure. But _if_some version of the pleasure principle is correct, then isn’t some version of the convergence conjecture at least feasible, i.e. they’ll recalibrate the set-point of their hedonic treadmill and enjoy gradients of (super)intelligent (super)happiness? One needn’t be a meta-ethical value-realist to acknowledge that subjects of experience universally find bliss is empirically more valuable than agony or despair. The present inability of natural science to explain first-person experiences doesn’t confer second-rate ontological status. If I may quote physicist Frank Wiczek,
“It is reasonable to suppose that the goal of a future-mind will be to optimize a mathematical measure of its well-being or achievement, based on its internal state. (Economists speak of ‘maximizing utility″, normal people of ‘finding happiness’.) The future-mind could discover, by its powerful introspective abilities or through experience, its best possible state the Magic Moment—or several excellent ones. It could build up a library of favourite states. That would be like a library of
favourite movies, but more vivid, since to recreate magic moments accurately would be equivalent to living through them. Since the joys of discovery, triumph and fulfillment require novelty, to re-live a magic moment properly, the future-mind would have to suppress memory of that moment’s previous realizations.
A future-mind focused upon magic moments is well matched to the limitations of reversible computers, which expend no energy. Reversible computers cannot store new memories, and they are as likely to run backwards as forwards. Those limitations bar adaptation and evolution, but invite eternal cycling through magic moments. Since energy becomes a scarce quantity in an expanding universe, that scenario might well describe the long-term future of mind in the cosmos.”
(Frank Wiczek)
[Big troubles, imagined and real; published in Global Catastrophic Risks, eds Nick Bostrom, Milan M. Cirkovic, OUP, 2008)
So is convergence on the secular equivalent of Heaven inevitable? I guess not. One can think of multiple possible defeaters. For instance, if the IJ Good / SIAI conception of the Intelligence Explosion (as I understand it) is correct, then the orthogonality thesis is plausible for a hypothetical AGI. On this story, might e.g. an innocent classical utilitarian build AGI-in-a-box that goes FOOM and launches a utilitronium shockwave? (etc)
But in our current state of ignorance, I’m just not yet convinced we know enough to rule out the convergence hypothesis.
David, what are those multiple possible defeaters for convergence? As I see it, the practical defeaters that exist still don’t affect the convergence thesis, they just are possible practical impediments, from unintelligent agents, to the realization of the goals of convergence.
I usually treat this behavior as something similar to the availability heuristic.
That is, there’s a theory that one of the ways humans calibrate our estimates of the likelihood of an event X is by trying to imagine an instance of X, and measuring how long that takes, and calculating our estimate of probability inverse-proportionally to the time involved. (This process is typically not explicitly presented to conscious awareness.) If the imagined instance of X is immediately available, we experience high confidence that X is true.
That mechanism makes a certain amount of rough-and-ready engineering sense, though of course it has lots of obvious failure modes, especially as you expand the system’s imaginative faculties. Many of those failure modes are frequently demonstrated in modern life.
The thing is, we use much of the same machinery that we evolved for considering events like “a tiger eats my children” to consider pseudo-events like “a tiger eating my children is a bad thing.” So it’s easy for us to calibrate our estimates of the likelihood that a tiger eating my children is a bad thing in the same way: if an instance of a tiger eating my children feeling like a bad thing is easy for me to imagine, I experience high confidence that the proposition is true. It just feels obvious.
I don’t think this is quite the same thing as moral realism, but when that judgment is simply taken as an input without being carefully examined, the result is largely equivalent.
Conversely, the more easily I can imagine a tiger eating my children not feeling like a bad thing, the lower that confidence. More generally, the more I actually analyze (rather than simply referencing) my judgments, the less compelling this mechanism becomes.
What I expect, given the above, is that if I want to shake someone off that kind of naive moral realist position, it helps to invite them to consider situations in which they arrive at counterintuitive (to them) moral judgments. The more I do this, the less strongly the availability heuristic fires, and over time this will weaken that leg of their implicit moral realism, even if I never engage with it directly.
I’ve known a number of people who react very very negatively to being invited to consider such situations, though, even if they don’t clearly perceive it as an attack on their moral confidence.
More generally, the more I actually analyze (rather than simply referencing) my judgments, the less compelling this mechanism becomes.
it helps to invite them to consider situations in which they arrive at counterintuitive (to them) moral judgments
But philosophers are extremely fond of analysis, and make great use of trolley problems and similar edge cases. I’m really torn—people who seem very smart and skilled in reasoning take positions that seem to make no sense. I keep telling myself that they are probably right and I’m wrong, but the more I read about their justifications, the less convincing they are...
Yeah, that’s fair. Not all philosophers do this, any more than all computer programmers come up with test cases to ensure their code is doing what it ought, but I agree it’s a common practice.
Can you summarize one of those positions as charitably as you’re able to? It might be that given that someone else can offer an insight that extends that structure.
“There are sets of objective moral truths such that any rational being that understood them would be compelled to follow them”.
The arguments seem mainly to be:
1) Playing around with the meaning of rationality until you get something (“any rational being would realise their own pleasure is no more valid than that of others” or “pleasure is the highest principle, and any rational being would agree with this, or else be irrational”)
2) Convergence among human values.
3) Moral progress for society: we’re better than we used to be, so there needs to be some scale to measure the improvements.
4) Moral progress for individuals: when we think about things a lot, we make better moral decisions than when we were young and naive. Hence we’re getting better a moral reasoning, so these is some scale on which to measure this.
5) Playing around with the definition of “truth-apt” (able to have a valid answer) in ways that strike me, uncharitably, as intuition-pumping word games. When confronted with this, I generally end up saying something like “my definitions do not map on exactly to yours, so your logical steps are false dichotomies for me”.
6) Realising things like “if you can’t be money pumped, you must be an expected utility maximiser”, which implies that expected utility maximisation is superior to other reasoning, hence that there are some methods of moral reasoning which are strictly inferior. Hence there must be better ways of moral reasoning and (this is the place where I get off) a single best way (though that argument is generally implicit, never explicit).
OK, so let me start out by saying that my position is similar to yours… that is, I think most of this is nonsense. But having said that, and trying to adopt the contrary position for didactic purposes… hm.
So, a corresponding physical-realist assertion might be that there are sets of objective physical structures such that any rational being that perceived the evidence for them would be compelled to infer their existence. (Yes?)
Now, why might one believe such a thing? Well, some combination of reasons 2-4 seems to capture it.
That is: in practice, there at least seem to be physical structures we all infer from our senses such that we achieve more well-being with less effort when we act as though those structures existed. And there are other physical structures that we infer the existence of via a more tenuous route (e.g., the center of the Earth, or Alpha Centauri, or quarks, or etc.), to which #2 doesn’t really apply (most people who believe in quarks have been taught to believe in them by others; they mostly didn’t independently converge on that belief), but 3 and 4 do… when we posit the existence of these entities, we achieve worthwhile things that we wouldn’t achieve otherwise, though sometimes it’s very difficult to express clearly what those things actually are. (Yes?)
So… ok. Does that case for physical realism seem compelling to you? If so, and if arguments 2-4 are sufficient to compel a belief in physical realism, why are their analogs insufficient to compel a belief in moral realism?
So… ok. Does that case for physical realism seem compelling to you?
No—to me it just highlights the difference between physical facts and moral facts, making them seem very distinct. But I can see how if we had really strong 2-4, it might make more sense...
I’m not quite sure I understood you. Are you saying “no,” that case for physical realism doesn’t seem compelling to you? Or are you saying “no,” the fact that such a case can compellingly be made for physical realism does not justify an analogous case for moral realism?
So, given a moral realist, Sam, who argued as follows:
“We agree that humans typically infer physical facts such that we achieve more well-being with less effort when we act as though those facts were actual, and that this constitutes a compelling case for physical realism. It seems to me that humans typically infer moral facts such that we achieve more well-being with less effort when we act as though those facts were actual, and I consider that an equally compelling case for moral realism.”
...it seems you ought to have a pretty good sense of why Sam is a moral realist, and what it would take to convince Sam they were mistaken.
Interesting perspective. Is this an old argument, or a new one? (seems vaguely similar to the Pascalian “act as if you believe, and that will be better for you”).
It might be formalisable in terms of bounded agents and stuff. What’s interesting is that though it implies moral realism, it doesn’t imply the usual consequence of moral realism (that all agents converge on one ethics). I’d say I understood Sam’s position, and that he has no grounds to disbelieve orthogonality!
I’d be astonished if it were new, but I’m not knowingly quoting anyone.
As for orthogonality.. well, hm. Continuing the same approach… suppose Sam says to you:
“I believe that any two sufficiently intelligent, sufficiently rational systems will converge on a set of confidence levels in propositions about physical systems, both coarse-grained (e.g., “I’m holding a rock”) and fine-grained (e.g. some corresponding statement about quarks or configuration spaces or whatever). I believe that precisely because I’m a de facto physical realist; whatever it is about the universe that constrains our experiences such that we achieve more well-being with less effort when we act as though certain statements about the physical world are true and other statements are not, I believe that’s an intersubjective property—the things that it is best for me to believe about the physical world are also the things that it is best for you to believe about the physical world, because that’s just what it means for both of us to be living in the same real physical world.
For precisely the same reasons, I believe that any two sufficiently intelligent, sufficiently rational systems will converge on a set of confidence levels in propositions about moral systems.”
1) Evidence. There is a general convergence on physical facts, but nothing like a convergence on moral facts. Also, physcial facts, since science, are progressive (we don’t say Newton was wrong, we say we have a better theory of which his was an approximation to).
2) Evidence. We have established what counts as evidence for a physical theory (and have, to some extent, separated it from simply “everyone believes this”). What then counts as evidence for a moral theory?
Awesome! So, reversing this, if you want to understand the position of a moral realist, it sounds like you could consider them in the position of a physical realist before the Enlightenment.
There was disagreement then about underlying physical theory, and indeed many physical theories were deeply confused, and the notion of evidence for a physical theory was not well-formalized, but if you asked a hundred people questions like “is this a rock or a glass of milk?” you’d get the same answer from all of them (barring weirdness), and there were many physical realists nevertheless based solely on that, and this is not terribly surprising.
Similarly, there is disagreement today about moral theory, and many moral theories are deeply confused, and the notion of evidence for a moral theory is not well-formalized, but if you ask a hundred people questions like “is killing an innocent person right or wrong?” you’ll get the same answer from all of them (barring weirdness), so it ought not be surprising that there are many moral realists based on that.
Similarly, there is disagreement today about moral theory, and many moral theories are deeply confused, and the notion of evidence for a moral theory is not well-formalized, but if you ask a hundred people questions like “is killing an innocent person right or wrong?” you’ll get the same answer from all of them (barring weirdness)
I think there may be enough “weirdness” in response to moral questions that it would be irresponsible to treat it as dismissible.
Interesting. I have no idea if this is actually how moral realists think, but it does give me a handle so that I can imagine myself in that situation...
Sure, agreed. I suspect that actual moral realists think in lots of different ways. (Actual physical realists do, too.) But I find that starting with an existence-proof of “how might I believe something like this?” makes subsequent discussions easier.
From my perspective, treating rationality as always instrumental, and never a terminal value is playing around with it’s traditional meaning. (And indiscriminately teaching instrumental rationality is like indiscriminately handing out weapons. The traditional idea, going back to st least Plato, is that teaching someone to be rational improves them...changes their values)
My paper which you cited needs a bit of updating. Indeed some cases might lead a superintelligence to collaborate with agents without the right ethical mindset (unethical), which constitutes an important existential risk (a reason why I was a bit reluctant to publish much about it).
However, isn’t the orthogonality thesis basically about the orthogonality between ethics and intelligence? In that case, the convergence thesis is would not be flawed if some unintelligent agents kidnap and force an intelligent agent to act unethically.
Let’s imagine starting with a blank slate, the physical universe, and building ethical value in it. Hypothetically in a meta-ethical scenario of error theory (which I assume is where you’re coming from), or possible variability of values, this kind of “bottom-up” reasoning would make sense for more intelligent agents that could alter their own values, so that they could find, from “bottom-up”, values that could be more optimally produced, and also this kind of reasoning would make sense for them in order to fundamentally understand meta-ethics and the nature of value.
In order to connect to the production of some genuine ethical value in this universe, arguably some things would have to be built the same way, with certain conditions, while hypothetically others things could vary, in the value production chain. This is because ethical value could not be absolutely anything, otherwise those things could not be genuinely valuable. If all could be fundamentally valuable, then nothing would really be, because value requires a discrimination in terms of better and worse. Somewhere in the value production chain, some things would have to be constant in order for there to be genuine value. Do you agree so far?
If some things have to be constant in the value production chain, and some things could hypothetically vary, then the constant things would be the really important in creating value, and the variable things would be accessory, and could be randomly specified with some degree of freedom, by those that be analyzing value production from a “bottom-up” perspective in a physical universe. It would seem therefore that the constant things could likely be what is truly valuable, while the variable and accessory things could be mere triggers or engines in the value production chain.
I argue that, in the case of humans and of this universe, the constant things are what really constitute value. There is some constant and universal value in the universe, or meta-ethical moral realism. The variable things, which are accessory, triggers or engines in the value production chain, are preferences or tastes. Those preferences that are valid are those that ultimately connect to what is constant in producing value.
Now, from an empirical perspective, what ethical value has in common in this universe is its relationship to consciousness. What happens in totally unconscious regions of the universe doesn’t have any ethical relevance in itself, and only consciousness can ultimately have ethical value.
Consciousness is a peculiar physical phenomenon. It is representational in its nature, and as a representation it can freely differ or vary from the objects it represents. This difference or variability could be, for example, representing a wavelength of light in the vision field as a phenomenal color, or dreaming of unicorns, both of which transcend the original sources of data in the physical universe. The existence of consciousness is what there is of most epistemologically certain to conscious observers, this certainty is higher than that of any objects in this universe, because while objects could be illusions arising from the aforementioned variability in representation, consciousness itself is the most directly verifiable phenomenon. Therefore, the existence of conscious perceptions is more certain than the physical universe or than any physical theories, for example. Those could hypothetically be the product of false world simulations.
Consciousness can produce ethical value due to the transcendental freedom afforded by its representational nature, which is the same freedom that allows the existence of phenomenal colors.
Ethics is about defining value, what is good and bad, and how to produce it. If consciousness is what contains ethical value, then this ethical value lies in good and bad conscious experiences.
Variability in the production chain of good and bad conscious experiences for humans is accessory, as preferences and tastes, and in their ethical dimension they ultimately connect to good and bad conscious experiences. From a physical perspective, it could be said that the direct production of good and bad conscious experiences by nerve cells in brains is what constitutes direct ethical value, and that preferences are accessory triggers or engines that lead to this ethical value production. From paragraph 8, it follows that preferences are only ethically valid insofar as they connect to good and bad conscious experiences, in the present or future. People’s brains are like labyrinths with different paths ultimately leading to the production of good and bad feelings, but what matters is that production, not the initial triggers that pass through that labyrinth.
By the previous paragraphs, we have moral realism and constant values, with variability only apparent or accessory. So greater intelligence would find this and not vary. Now, depending on the question of personal identity, you may ask: what about selfishness?
If someone can help we with this, I’d be most grateful. Why does “for reasons we don’t know, any being will come to share and follow specific moral principles (but we don’t know what they are)”, rise to seem plausible?
How about morality as an attractor—which nature approaches. Some goals are better than others—evolution finds the best ones.
So: game theory: reciprocity, kin selection/tag-based cooperation and virtue signalling.
As J. Storrs-Hall puts it in: “Intelligence Is Good”
There is but one good, namely, knowledge; and but one evil, namely ignorance.
—Socrates, from Diogenes Laertius’s Life of Socrates
As a matter of practical fact, criminality is strongly and negatively correlated with IQ in humans. The popular image of the tuxedo-wearing, suave jet-setter jewel thief to the contrary notwithstanding, almost all career criminals are of poor means as well as of lesser intelligence.”
Defecting typically ostracises you—and doesn’t make much sense in a smart society which can track repuations.
Evolution creates social species, though. Machines will be social too—their memetic relatedness might well be very high—an enormous win for kin selection-based theories based on shared memes. Of course they are evolving, and will evolve too—cultural evolution is still evolution.
So this presumes that the machines in question will evolve in social settings? That’s a pretty big assumption. Moreover, empirically speaking having in-group loyalty of that sort isn’t nearly enough to ensure that you are friendly with nearby entities- look at how many hunter-gatherer groups are in a state of almost constant war with their neighbors. The attitude towards other sentients (such as humans) isn’t going to be great even if there is some approximate moral attractor of that sort.
So this presumes that the machines in question will evolve in social settings? That’s a pretty big assumption.
I’m not sure what you mean. It presumes that there will be more than one machine. The ‘lumpiness’ of the universe is likely to produce natural boundaries. It seems to be a small assumption.
Moreover, empirically speaking having in-group loyalty of that sort isn’t nearly enough to ensure that you are friendly with nearby entities- look at how many hunter-gatherer groups are in a state of almost constant war with their neighbors.
Sure, but cultural evolution produces cooperation on a massive scale.
The attitude towards other sentients (such as humans) isn’t going to be great even if there is some approximate moral attractor of that sort.
Right—so: high morality seems to be reasonably compatible with some ant-squishing. The point here is about moral attractors—not the fate of humans.
I’m not sure what you mean. It presumes that there will be more than one machine. The ‘lumpiness’ of the universe is likely to produce natural boundaries. It seems to be a small assumption.
It is a major assumption. To use the most obvious issue if someone is starting up an attempted AGI on a single computer (say it is the only machine that has enough power) then this won’t happen. It also won’t happen if one isn’t having a large variety of machines which are actually engaging in generational copying. That means that say if one starts with ten slightly different machines, if the population doesn’t grow in distinct entities this isn’t going to do what you want. And if the entities lack a distinction between genotype and phenotype (as computer programs unlikely biological entities actually do) then this is also off because one will not be subject to a Darwinian system but rather a pseudo-Lamarckian one which doesn’t act the same way.
The point here is about moral attractors—not the fate of humans.
So your point seems to come down purely to the fact that evolved entities will do this, and a vague hope that people will deliberately put entities into this situation. This is both not helpful for the fundamental philosophical claim (which doesn’t care about what empirically is likely to happen) and is not practically helpful since there’s no good reason to think that any machine entities will actually be put into such a situation.
I’m not sure what you mean. It presumes that there will be more than one machine. The ‘lumpiness’ of the universe is likely to produce natural boundaries. It seems to be a small assumption.
It is a major assumption. To use the most obvious issue if someone is starting up an attempted AGI on a single computer (say it is the only machine that has enough power) then this won’t happen.
A multi-planetary living system is best described as being multiple agents, IMHO. The unity you suggest would represent relatedness approaching 1 - the ultimate win in terms of altruism and cooperation.
It also won’t happen if one isn’t having a large variety of machines which are actually engaging in generational copying.
Without copying there’s no life. Copying is unavoidable. Variation is practically ineviable too—for instance, local adaptation.
And if the entities lack a distinction between genotype and phenotype (as computer programs unlikely biological entities actually do) then this is also off because one will not be subject to a Darwinian system but rather a pseudo-Lamarckian one which doesn’t act the same way.
Computer programs do have the split between heredity and non heritble elements—which is the basic idea here, or it should be.
Darwin believed in cultural evolution: “The survival or preservation of certain favoured words in the struggle for existence is natural selection”—so surely cultural evolution is Darwinian.
Most of the game theory that underlies cooperation applies to both cultural and organic evolution. In particular, reciprocity, kin selection, and reputations apply in both domains.
So your point seems to come down purely to the fact that evolved entities will do this, and a vague hope that people will deliberately do so. This is both not helpful for the fundamental philosophical claim (which doesn’t care about what empirically is likely to happen) and is not practically helpful since there’s no good reason to think that any machine entities will actually be put into such a situation.
I didn’t follow that bit—though I can see that it sounds a bit negative.
Evolution has led to social, technological, intellectual and moral progress. It’s conservative to expect these trends to continue.
Attractors are features of evolutionary systems, it’d be wierd if their weren’t attractors in goal space. Here’s a paper which touches on that (I don’t necessarily buy all of it, but the part about morality as an attractor in goal systems of evolving cooperating game theoretic agents is interesting)
Attractors are features of evolutionary systems, it’d be wierd if their weren’t attractors in goal space.
Sure. Think about the optimal creature—for instance—and don’t anybody tell me that fitness is relative to the environment—we can see the environment.
Another point is that—even if there’s no competition (and natural selection) involving alien races, the fear of such competiton is likely produce a similar adaptive effect—moving effective values towards universal instrumental values.
There is an implicit moral realism that does not make any sense to me.
You have made a number of posts on paraconsistent logic. Now it’s time to walk the walk. For the purpose of this referee report, accept moral realism and use it explicitly to argue with your paper.
It’s not that simple. I can’t figure out what the proposition being defended is exactly. It shifts in ways I can’t predict in the course of arguments and discussions. If I tried to defend it, my defence would end up being too caricatural or too weak.
Is your goal to affect their point of view? Or is it something else? For example, maybe your true target audience is those who donate to your organization and you just want to have a paper published to show them that they are not wasting their money. In any case, the paper should target your real audience, whatever it may be.
I want a paper to point those who make the thoughtless “the AI will be smart, so it’ll be nice” argument to. I want a paper that forces the moral realists (using the term very broadly) to make specific counter arguments. I want to convince some of these people that AI is a risk, even if it’s not conscious or rational according to their definitions. I want something to build on to move towards convincing the AGI researchers. And I want a publication.
Who is your target audience? Can you pretend to be the actual person you are trying to convince and do your absolute best to demolish the arguments presented in this paper? (You can find their arguments in their publications and apply them to your paper.) And no counter-objections until you finished writing what essentially is a referee report. If you need some extra motivation, pretend that you are being paid $100 for each argument that convinces the rest of the audience and $1000 for each argument that convinces the paper author. When done, post the referee report here, and people will tell you whether you did a good job.
No, I cannot. I’ve read the various papers, and they all orbit around an implicit and often unstated moral realism. I’ve also debated philosophers on this, and the same issue rears its head—I can counter their arguments, but their opinions don’t shift. There is an implicit moral realism that does not make any sense to me, and the more I analyse it, the less sense it makes, and the less convincing it becomes. Every time a philosopher has encouraged me to read a particular work, it’s made me find their moral realism less likely, because the arguments are always weak.
I can’t really put myself in their shoes to successfully argue their position (which I could do with theism, incidentally). I’ve tried and failed.
If someone can help we with this, I’d be most grateful. Why does “for reasons we don’t know, any being will come to share and follow specific moral principles (but we don’t know what they are)”, rise to seem plausible?
Just how diverse is human motivation? Should we discount even sophisticated versions of psychological hedonism? Undoubtedly, the “pleasure principle” is simplistic as it stands. But one good reason not to try heroin, for example, is precisely that the reward architecture of our opioid pathways is so similar. Previously diverse life-projects of first-time heroin users are at risk of converging on a common outcome. So more broadly, let’s consider the class of life-supporting Hubble volumes where sentient biological robots acquire the capacity to rewrite their genetic source code and gain mastery of their own reward circuitry. May we predict orthogonality or convergence? Certainly, there are strong arguments why such intelligences won’t all become the functional equivalent of heroin addicts or wireheads or Nozick Experience Machine VR-heads (etc). One such argument is the nature of selection pressure. But _if_some version of the pleasure principle is correct, then isn’t some version of the convergence conjecture at least feasible, i.e. they’ll recalibrate the set-point of their hedonic treadmill and enjoy gradients of (super)intelligent (super)happiness? One needn’t be a meta-ethical value-realist to acknowledge that subjects of experience universally find bliss is empirically more valuable than agony or despair. The present inability of natural science to explain first-person experiences doesn’t confer second-rate ontological status. If I may quote physicist Frank Wiczek,
“It is reasonable to suppose that the goal of a future-mind will be to optimize a mathematical measure of its well-being or achievement, based on its internal state. (Economists speak of ‘maximizing utility″, normal people of ‘finding happiness’.) The future-mind could discover, by its powerful introspective abilities or through experience, its best possible state the Magic Moment—or several excellent ones. It could build up a library of favourite states. That would be like a library of favourite movies, but more vivid, since to recreate magic moments accurately would be equivalent to living through them. Since the joys of discovery, triumph and fulfillment require novelty, to re-live a magic moment properly, the future-mind would have to suppress memory of that moment’s previous realizations.
A future-mind focused upon magic moments is well matched to the limitations of reversible computers, which expend no energy. Reversible computers cannot store new memories, and they are as likely to run backwards as forwards. Those limitations bar adaptation and evolution, but invite eternal cycling through magic moments. Since energy becomes a scarce quantity in an expanding universe, that scenario might well describe the long-term future of mind in the cosmos.” (Frank Wiczek) [Big troubles, imagined and real; published in Global Catastrophic Risks, eds Nick Bostrom, Milan M. Cirkovic, OUP, 2008)
So is convergence on the secular equivalent of Heaven inevitable? I guess not. One can think of multiple possible defeaters. For instance, if the IJ Good / SIAI conception of the Intelligence Explosion (as I understand it) is correct, then the orthogonality thesis is plausible for a hypothetical AGI. On this story, might e.g. an innocent classical utilitarian build AGI-in-a-box that goes FOOM and launches a utilitronium shockwave? (etc) But in our current state of ignorance, I’m just not yet convinced we know enough to rule out the convergence hypothesis.
David, what are those multiple possible defeaters for convergence? As I see it, the practical defeaters that exist still don’t affect the convergence thesis, they just are possible practical impediments, from unintelligent agents, to the realization of the goals of convergence.
I usually treat this behavior as something similar to the availability heuristic.
That is, there’s a theory that one of the ways humans calibrate our estimates of the likelihood of an event X is by trying to imagine an instance of X, and measuring how long that takes, and calculating our estimate of probability inverse-proportionally to the time involved. (This process is typically not explicitly presented to conscious awareness.) If the imagined instance of X is immediately available, we experience high confidence that X is true.
That mechanism makes a certain amount of rough-and-ready engineering sense, though of course it has lots of obvious failure modes, especially as you expand the system’s imaginative faculties. Many of those failure modes are frequently demonstrated in modern life.
The thing is, we use much of the same machinery that we evolved for considering events like “a tiger eats my children” to consider pseudo-events like “a tiger eating my children is a bad thing.” So it’s easy for us to calibrate our estimates of the likelihood that a tiger eating my children is a bad thing in the same way: if an instance of a tiger eating my children feeling like a bad thing is easy for me to imagine, I experience high confidence that the proposition is true. It just feels obvious.
I don’t think this is quite the same thing as moral realism, but when that judgment is simply taken as an input without being carefully examined, the result is largely equivalent.
Conversely, the more easily I can imagine a tiger eating my children not feeling like a bad thing, the lower that confidence. More generally, the more I actually analyze (rather than simply referencing) my judgments, the less compelling this mechanism becomes.
What I expect, given the above, is that if I want to shake someone off that kind of naive moral realist position, it helps to invite them to consider situations in which they arrive at counterintuitive (to them) moral judgments. The more I do this, the less strongly the availability heuristic fires, and over time this will weaken that leg of their implicit moral realism, even if I never engage with it directly.
I’ve known a number of people who react very very negatively to being invited to consider such situations, though, even if they don’t clearly perceive it as an attack on their moral confidence.
But philosophers are extremely fond of analysis, and make great use of trolley problems and similar edge cases. I’m really torn—people who seem very smart and skilled in reasoning take positions that seem to make no sense. I keep telling myself that they are probably right and I’m wrong, but the more I read about their justifications, the less convincing they are...
Yeah, that’s fair. Not all philosophers do this, any more than all computer programmers come up with test cases to ensure their code is doing what it ought, but I agree it’s a common practice.
Can you summarize one of those positions as charitably as you’re able to? It might be that given that someone else can offer an insight that extends that structure.
“There are sets of objective moral truths such that any rational being that understood them would be compelled to follow them”. The arguments seem mainly to be:
1) Playing around with the meaning of rationality until you get something (“any rational being would realise their own pleasure is no more valid than that of others” or “pleasure is the highest principle, and any rational being would agree with this, or else be irrational”)
2) Convergence among human values.
3) Moral progress for society: we’re better than we used to be, so there needs to be some scale to measure the improvements.
4) Moral progress for individuals: when we think about things a lot, we make better moral decisions than when we were young and naive. Hence we’re getting better a moral reasoning, so these is some scale on which to measure this.
5) Playing around with the definition of “truth-apt” (able to have a valid answer) in ways that strike me, uncharitably, as intuition-pumping word games. When confronted with this, I generally end up saying something like “my definitions do not map on exactly to yours, so your logical steps are false dichotomies for me”.
6) Realising things like “if you can’t be money pumped, you must be an expected utility maximiser”, which implies that expected utility maximisation is superior to other reasoning, hence that there are some methods of moral reasoning which are strictly inferior. Hence there must be better ways of moral reasoning and (this is the place where I get off) a single best way (though that argument is generally implicit, never explicit).
(nods) Nice.
OK, so let me start out by saying that my position is similar to yours… that is, I think most of this is nonsense. But having said that, and trying to adopt the contrary position for didactic purposes… hm.
So, a corresponding physical-realist assertion might be that there are sets of objective physical structures such that any rational being that perceived the evidence for them would be compelled to infer their existence. (Yes?)
Now, why might one believe such a thing? Well, some combination of reasons 2-4 seems to capture it.
That is: in practice, there at least seem to be physical structures we all infer from our senses such that we achieve more well-being with less effort when we act as though those structures existed. And there are other physical structures that we infer the existence of via a more tenuous route (e.g., the center of the Earth, or Alpha Centauri, or quarks, or etc.), to which #2 doesn’t really apply (most people who believe in quarks have been taught to believe in them by others; they mostly didn’t independently converge on that belief), but 3 and 4 do… when we posit the existence of these entities, we achieve worthwhile things that we wouldn’t achieve otherwise, though sometimes it’s very difficult to express clearly what those things actually are. (Yes?)
So… ok. Does that case for physical realism seem compelling to you?
If so, and if arguments 2-4 are sufficient to compel a belief in physical realism, why are their analogs insufficient to compel a belief in moral realism?
No—to me it just highlights the difference between physical facts and moral facts, making them seem very distinct. But I can see how if we had really strong 2-4, it might make more sense...
I’m not quite sure I understood you. Are you saying “no,” that case for physical realism doesn’t seem compelling to you? Or are you saying “no,” the fact that such a case can compellingly be made for physical realism does not justify an analogous case for moral realism?
The second one!
So, given a moral realist, Sam, who argued as follows:
“We agree that humans typically infer physical facts such that we achieve more well-being with less effort when we act as though those facts were actual, and that this constitutes a compelling case for physical realism. It seems to me that humans typically infer moral facts such that we achieve more well-being with less effort when we act as though those facts were actual, and I consider that an equally compelling case for moral realism.”
...it seems you ought to have a pretty good sense of why Sam is a moral realist, and what it would take to convince Sam they were mistaken.
No?
Interesting perspective. Is this an old argument, or a new one? (seems vaguely similar to the Pascalian “act as if you believe, and that will be better for you”).
It might be formalisable in terms of bounded agents and stuff. What’s interesting is that though it implies moral realism, it doesn’t imply the usual consequence of moral realism (that all agents converge on one ethics). I’d say I understood Sam’s position, and that he has no grounds to disbelieve orthogonality!
I’d be astonished if it were new, but I’m not knowingly quoting anyone.
As for orthogonality.. well, hm. Continuing the same approach… suppose Sam says to you:
“I believe that any two sufficiently intelligent, sufficiently rational systems will converge on a set of confidence levels in propositions about physical systems, both coarse-grained (e.g., “I’m holding a rock”) and fine-grained (e.g. some corresponding statement about quarks or configuration spaces or whatever). I believe that precisely because I’m a de facto physical realist; whatever it is about the universe that constrains our experiences such that we achieve more well-being with less effort when we act as though certain statements about the physical world are true and other statements are not, I believe that’s an intersubjective property—the things that it is best for me to believe about the physical world are also the things that it is best for you to believe about the physical world, because that’s just what it means for both of us to be living in the same real physical world.
For precisely the same reasons, I believe that any two sufficiently intelligent, sufficiently rational systems will converge on a set of confidence levels in propositions about moral systems.”
You consider that reasoning ungrounded. Why?
1) Evidence. There is a general convergence on physical facts, but nothing like a convergence on moral facts. Also, physcial facts, since science, are progressive (we don’t say Newton was wrong, we say we have a better theory of which his was an approximation to).
2) Evidence. We have established what counts as evidence for a physical theory (and have, to some extent, separated it from simply “everyone believes this”). What then counts as evidence for a moral theory?
Awesome! So, reversing this, if you want to understand the position of a moral realist, it sounds like you could consider them in the position of a physical realist before the Enlightenment.
There was disagreement then about underlying physical theory, and indeed many physical theories were deeply confused, and the notion of evidence for a physical theory was not well-formalized, but if you asked a hundred people questions like “is this a rock or a glass of milk?” you’d get the same answer from all of them (barring weirdness), and there were many physical realists nevertheless based solely on that, and this is not terribly surprising.
Similarly, there is disagreement today about moral theory, and many moral theories are deeply confused, and the notion of evidence for a moral theory is not well-formalized, but if you ask a hundred people questions like “is killing an innocent person right or wrong?” you’ll get the same answer from all of them (barring weirdness), so it ought not be surprising that there are many moral realists based on that.
I think there may be enough “weirdness” in response to moral questions that it would be irresponsible to treat it as dismissible.
Yes, there may well be.
Interesting. I have no idea if this is actually how moral realists think, but it does give me a handle so that I can imagine myself in that situation...
Sure, agreed.
I suspect that actual moral realists think in lots of different ways. (Actual physical realists do, too.)
But I find that starting with an existence-proof of “how might I believe something like this?” makes subsequent discussions easier.
I could add: Objective punishments and rewards need objective justification.
From my perspective, treating rationality as always instrumental, and never a terminal value is playing around with it’s traditional meaning. (And indiscriminately teaching instrumental rationality is like indiscriminately handing out weapons. The traditional idea, going back to st least Plato, is that teaching someone to be rational improves them...changes their values)
Stuart, here is a defense of moral realism:
http://lesswrong.com/lw/gnb/questions_for_moral_realists/8g8l
My paper which you cited needs a bit of updating. Indeed some cases might lead a superintelligence to collaborate with agents without the right ethical mindset (unethical), which constitutes an important existential risk (a reason why I was a bit reluctant to publish much about it).
However, isn’t the orthogonality thesis basically about the orthogonality between ethics and intelligence? In that case, the convergence thesis is would not be flawed if some unintelligent agents kidnap and force an intelligent agent to act unethically.
Another argumentation for moral realism:
Let’s imagine starting with a blank slate, the physical universe, and building ethical value in it. Hypothetically in a meta-ethical scenario of error theory (which I assume is where you’re coming from), or possible variability of values, this kind of “bottom-up” reasoning would make sense for more intelligent agents that could alter their own values, so that they could find, from “bottom-up”, values that could be more optimally produced, and also this kind of reasoning would make sense for them in order to fundamentally understand meta-ethics and the nature of value.
In order to connect to the production of some genuine ethical value in this universe, arguably some things would have to be built the same way, with certain conditions, while hypothetically others things could vary, in the value production chain. This is because ethical value could not be absolutely anything, otherwise those things could not be genuinely valuable. If all could be fundamentally valuable, then nothing would really be, because value requires a discrimination in terms of better and worse. Somewhere in the value production chain, some things would have to be constant in order for there to be genuine value. Do you agree so far?
If some things have to be constant in the value production chain, and some things could hypothetically vary, then the constant things would be the really important in creating value, and the variable things would be accessory, and could be randomly specified with some degree of freedom, by those that be analyzing value production from a “bottom-up” perspective in a physical universe. It would seem therefore that the constant things could likely be what is truly valuable, while the variable and accessory things could be mere triggers or engines in the value production chain.
I argue that, in the case of humans and of this universe, the constant things are what really constitute value. There is some constant and universal value in the universe, or meta-ethical moral realism. The variable things, which are accessory, triggers or engines in the value production chain, are preferences or tastes. Those preferences that are valid are those that ultimately connect to what is constant in producing value.
Now, from an empirical perspective, what ethical value has in common in this universe is its relationship to consciousness. What happens in totally unconscious regions of the universe doesn’t have any ethical relevance in itself, and only consciousness can ultimately have ethical value.
Consciousness is a peculiar physical phenomenon. It is representational in its nature, and as a representation it can freely differ or vary from the objects it represents. This difference or variability could be, for example, representing a wavelength of light in the vision field as a phenomenal color, or dreaming of unicorns, both of which transcend the original sources of data in the physical universe. The existence of consciousness is what there is of most epistemologically certain to conscious observers, this certainty is higher than that of any objects in this universe, because while objects could be illusions arising from the aforementioned variability in representation, consciousness itself is the most directly verifiable phenomenon. Therefore, the existence of conscious perceptions is more certain than the physical universe or than any physical theories, for example. Those could hypothetically be the product of false world simulations.
Consciousness can produce ethical value due to the transcendental freedom afforded by its representational nature, which is the same freedom that allows the existence of phenomenal colors.
Ethics is about defining value, what is good and bad, and how to produce it. If consciousness is what contains ethical value, then this ethical value lies in good and bad conscious experiences.
Variability in the production chain of good and bad conscious experiences for humans is accessory, as preferences and tastes, and in their ethical dimension they ultimately connect to good and bad conscious experiences. From a physical perspective, it could be said that the direct production of good and bad conscious experiences by nerve cells in brains is what constitutes direct ethical value, and that preferences are accessory triggers or engines that lead to this ethical value production. From paragraph 8, it follows that preferences are only ethically valid insofar as they connect to good and bad conscious experiences, in the present or future. People’s brains are like labyrinths with different paths ultimately leading to the production of good and bad feelings, but what matters is that production, not the initial triggers that pass through that labyrinth.
By the previous paragraphs, we have moral realism and constant values, with variability only apparent or accessory. So greater intelligence would find this and not vary. Now, depending on the question of personal identity, you may ask: what about selfishness?
How about morality as an attractor—which nature approaches. Some goals are better than others—evolution finds the best ones.
Why do we have any reason to think this is the case?
So: game theory: reciprocity, kin selection/tag-based cooperation and virtue signalling.
As J. Storrs-Hall puts it in: “Intelligence Is Good”
Defecting typically ostracises you—and doesn’t make much sense in a smart society which can track repuations.
We already know about universal instrumental values. They illustrate what moral attractors look like.
I discussed this issue some more in Handicapped Superintelligence.
Doesn’t most of this amount to morality as an attractor for evolved social species?
Evolution creates social species, though. Machines will be social too—their memetic relatedness might well be very high—an enormous win for kin selection-based theories based on shared memes. Of course they are evolving, and will evolve too—cultural evolution is still evolution.
So this presumes that the machines in question will evolve in social settings? That’s a pretty big assumption. Moreover, empirically speaking having in-group loyalty of that sort isn’t nearly enough to ensure that you are friendly with nearby entities- look at how many hunter-gatherer groups are in a state of almost constant war with their neighbors. The attitude towards other sentients (such as humans) isn’t going to be great even if there is some approximate moral attractor of that sort.
I’m not sure what you mean. It presumes that there will be more than one machine. The ‘lumpiness’ of the universe is likely to produce natural boundaries. It seems to be a small assumption.
Sure, but cultural evolution produces cooperation on a massive scale.
Right—so: high morality seems to be reasonably compatible with some ant-squishing. The point here is about moral attractors—not the fate of humans.
It is a major assumption. To use the most obvious issue if someone is starting up an attempted AGI on a single computer (say it is the only machine that has enough power) then this won’t happen. It also won’t happen if one isn’t having a large variety of machines which are actually engaging in generational copying. That means that say if one starts with ten slightly different machines, if the population doesn’t grow in distinct entities this isn’t going to do what you want. And if the entities lack a distinction between genotype and phenotype (as computer programs unlikely biological entities actually do) then this is also off because one will not be subject to a Darwinian system but rather a pseudo-Lamarckian one which doesn’t act the same way.
So your point seems to come down purely to the fact that evolved entities will do this, and a vague hope that people will deliberately put entities into this situation. This is both not helpful for the fundamental philosophical claim (which doesn’t care about what empirically is likely to happen) and is not practically helpful since there’s no good reason to think that any machine entities will actually be put into such a situation.
A multi-planetary living system is best described as being multiple agents, IMHO. The unity you suggest would represent relatedness approaching 1 - the ultimate win in terms of altruism and cooperation.
Without copying there’s no life. Copying is unavoidable. Variation is practically ineviable too—for instance, local adaptation.
Computer programs do have the split between heredity and non heritble elements—which is the basic idea here, or it should be.
Darwin believed in cultural evolution: “The survival or preservation of certain favoured words in the struggle for existence is natural selection”—so surely cultural evolution is Darwinian.
Most of the game theory that underlies cooperation applies to both cultural and organic evolution. In particular, reciprocity, kin selection, and reputations apply in both domains.
I didn’t follow that bit—though I can see that it sounds a bit negative.
Evolution has led to social, technological, intellectual and moral progress. It’s conservative to expect these trends to continue.
Attractors are features of evolutionary systems, it’d be wierd if their weren’t attractors in goal space. Here’s a paper which touches on that (I don’t necessarily buy all of it, but the part about morality as an attractor in goal systems of evolving cooperating game theoretic agents is interesting)
Sure. Think about the optimal creature—for instance—and don’t anybody tell me that fitness is relative to the environment—we can see the environment.
Another point is that—even if there’s no competition (and natural selection) involving alien races, the fear of such competiton is likely produce a similar adaptive effect—moving effective values towards universal instrumental values.
You have made a number of posts on paraconsistent logic. Now it’s time to walk the walk. For the purpose of this referee report, accept moral realism and use it explicitly to argue with your paper.
It’s not that simple. I can’t figure out what the proposition being defended is exactly. It shifts in ways I can’t predict in the course of arguments and discussions. If I tried to defend it, my defence would end up being too caricatural or too weak.
Is your goal to affect their point of view? Or is it something else? For example, maybe your true target audience is those who donate to your organization and you just want to have a paper published to show them that they are not wasting their money. In any case, the paper should target your real audience, whatever it may be.
I want a paper to point those who make the thoughtless “the AI will be smart, so it’ll be nice” argument to. I want a paper that forces the moral realists (using the term very broadly) to make specific counter arguments. I want to convince some of these people that AI is a risk, even if it’s not conscious or rational according to their definitions. I want something to build on to move towards convincing the AGI researchers. And I want a publication.