Yes, it is not a ‘reductio ad absurdum’ in general, you are right. But it is one in the specific case of agents (like ourselves). I cannot decide that my suffering is not undesirable to me, and so I am limited to a normative frame of reference in at least this case.
Shiroe
In addition to the stickiness of institutional beliefs, I would add that individually agents cannot decide against their own objective functions (except merely verbally). In the case of humans, we cannot decide what qualities our phenomenal experience will have; it is a fact of the matter rather than an opinion that suffering is undesirable for oneself, etc.. One can verbally pronounce that “I don’t care about my suffering”, but the phenomenal experience of badness will in fact remain.
Exactly your point is what has prevented me from adopting the orthodox LessWrong position. If I knew that in the future Clippy was going to kill me and everyone else, I would consider that a neutral outcome. If, however, I knew that in the future some group of humans were going to successfully align an AGI to their interests, I would be far more worried.
If anyone knows of an Eliezer or SSC-level rebuttal to this, please let me know so that I can read it.
You say evil is not just goodness minimization, but what does that mean for utilitarianism, which has no specific concept of “evil” as distinct from “bad”?
“Evil is not just a goodness minimization problem” makes sense, but “Badness is not just a goodness minimization problem” doesn’t make sense to me. Your analysis hinges on a concept of evil as distinct from merely bad.
In the trolley problem, sacrificing the one to save the five will always lead to less badness, because fewer people are dead in the resulting state of the world than would otherwise be. This is why utilitarianism always chooses to sacrifice the one to save the five, ceteris paribus. Whether less badness is the same thing as less evilness is not considered, because utilitarianism has only one concept of utility. There may be additional contextual facts, e.g. that the person making the decision is employed as a switch operator. But unless these facts influence the resulting world-state (i.e. the number of casualties), they will not factor into the utility calculation.
Therefore, I do not think that your analysis works with utilitarianism. Though it may work for other ethical systems.
Now, you might say that the correct point of measurement is to the highest-positive-utility act; that utilitarianism says that all acts are measured relative to this. But this is not a position I believe is universally supported; certainly Karl Popper argued against this view of utilitarianism, proposing the framework of negative utilitarianism (I think he invented it?) as a solution to problems he saw with this worldview.
Total Act Utilitarianism is what comes to mind when I think of a “standard” utilitarian theory. Your theory seems like a kind of rule or non-total variant. Your alterations would be much unliked by someone like Peter Singer, who thinks that we have an obligation to help people simply because us doing so could improve their lives. Where you see neutrality, he would see obligation.
It should add up to normality, after all.
I disagree, and I think that you are more of a relativist than you are letting on. Ethics should be able to teach us things that we didn’t already know, perhaps even things that we didn’t want to acknowledge.
As for someone who murders fewer people than he saves, such a person would be superior to me (who saves nobody and kills nobody) and inferior to someone who saves many and kills nobody.
It’s always exciting to see a major venue presenting rationalist material.
You use that word, but the only meaningful source of that obligation, as I see it, is the desire to be a good person.
Yes, but then it sounds like those who have no such altruistic desire are equally justified as those who do. An alternative view of obligation, one which works very well with utilitarianism, is to reject personal identity as a psychological illusion. In that case there is no special difference between “my” suffering and “your” suffering, and my desire to minimize one of these rationally requires me to minimize the other. Many pantheists take such a view of ethics, and I believe its quasi-official name is “open individualism”.
This is a point of divergence, and I find that what ethical systems “teach us” is an area full of skulls.
You would prefer that we had the ethical intuitions and views of the first human beings, or perhaps of their hominid ancestors?
One way of understanding these “zero slack” theories is not that they approve/condemn things as morally good vs morally evil, but rather that they provide a single ordering of actions from best to worst. There is no negative (evil) half of the spectrum. Some things are just worse than others, and you should aspire to the best that you can, an idea which I don’t think is counterproductive at all.
I think this requires an assumption that there exists on obligation to end our own suffering
The obligation in this theory is conditional on you wanting to end your own suffering. If you don’t care about your own suffering, then you have no reason to care about the suffering of others. However, if you do care, then you must also care about the suffering of others.
Phrases like “AI safety” and “AI ethics” probably conjure up ideas closer to machine learning models with socially biased behavior, stock trading bot fiascos, and such. The Yudkowskian paradigm only applies to human-level AGI and above, which few researchers are pursuing explicitly.
I have sympathy for your fears, especially that “not killing everyone” is not sufficient for an AI to be considered well-aligned (arguably, “killing everyone” at least prevents the worst case scenarios that are possible from being realized). This seems to be an area where the line separating AI research from general ethics is blurry, and perhaps technically intractable for that reason.
The “God’s eye view” or “view from nowhere” reminds me of the position of the eliminativists in philosophy of mind. They assert that a physical world does exist, but that we are mistaken in thinking we have 1st person experiences like redness and painfulness.
It’s interesting the the term ‘abused’ was used with respect to AI. It makes me wonder if the bill has misalignment risks in mind at all or only misuse risks.
I would be very surprised if they had anything like the Yudkowskian paradigm in mind when they were thinking of this.
Bostrom and MIRI being cited is pretty cool. I would have thought they’d be outside the Overton window. EDIT: Do you know when the earliest citations occurred?
Complexity in case of humans is not evidence for the distinction, the standard position is that the stuff you are describing is complexity of extrapolated human wrapper-mind goal, not different kind of value, a paperclipper whose goals are much more detailed. From that point of view, the response to your post is “Huh?”, it doesn’t engage the crux of the disagreement.
“Huh?” was exactly my reaction. My values don’t vary depending on any environmental input; after all, they are the “ground truths” that give meaning to everything else.
Are you looking for particular examples of AI doing impressive stuff (like PaLM explaining jokes), or do you already have enough examples of that sort to draw from? One thing I would emphasize to your students is how easy it is to underestimate a system that is near us in certain abilities but not very human-like psychologically: for example, see this recent discussion about GPT-3.
Reading AI safety articles like this one, I always find myself nodding along in agreement. The conclusions simply follow from the premises, and the premises are so reasonable. Yet by the end, I always feel futility and frustration. Anyone who wanted to argue that AI safety was a hopeless program wouldn’t need to look any further than the AI safety literature! I’m not just referring to “death with dignity”. What fills me with dread and despair is paragraphs like this:
However, optimists often take a very empiricist frame, so they are likely to be interested in what kind of ML experiments or observations about ML models might change my mind, as opposed to what kinds of arguments might change my mind. I agree it would be extremely valuable to understand what we could concretely observe that would constitute major evidence against this view. But unfortunately, it’s difficult to describe simple and realistic near-term empirical experiments that would change my beliefs very much, because models today don’t have the creativity and situational awareness to play the training game. [original emphasis]
Here is the real chasm between the AI safety movement and the ML industry/academia. One field is entirely driven by experimental results; the other is dominated so totally by theory that its own practitioners deny that there can be any meaningful empirical aspect to it, at least, not until the moment when it’s too late to make any difference.
Years ago, I read an article about an RL agent wireheading itself via memory corruption, thereby ignoring its intended task. Either this article exists and I can’t find it now, or I’m misremembering. Either way, it’s exactly the sort of research that the AI safety community should be conducting and publishing right now (i.e. propaganda with epistemic benefits). With things like GPT-3 around nowadays, I bet one could even devise experiments where artificial agents learn to actually deceive humans (via Mechanical Turk, perhaps?). Imagine how much attention such an experiment could generate once journalists pick it up!
EDIT: This post is very close to what I have in mind.
Yes, but this century will be the decisive one. The phrase “most important century” isn’t claiming that future centuries lack moral significance, but the contrary.
Your attitude extends far past morality, and dissolves all problems in general because we can decide that something isn’t a problem.