It may help to link to this for context.
Also, what is your impression of Stop Gene Drives? Do their arguments about risks to humans seem in good faith, or is “humans don’t deserve to play god!” more like their real motive?
It may help to link to this for context.
Also, what is your impression of Stop Gene Drives? Do their arguments about risks to humans seem in good faith, or is “humans don’t deserve to play god!” more like their real motive?
That’s true that it could set a bad precedent. But it also could set a bad precedent to normalize letting millions of people die horribly just to avoid setting a bad precedent. It’s not immediately clear to me which is worse in the very-long-run.
Contrasting this post with techniques like Word2vec, which do map concepts into spatial dimensions. Every word is assigned a vector and associations are learned via backprop by predicting nearby text. This allows you to perform conceptual arithmetic like “Brother”—“Man” + “Woman”, giving a result which is a vector very close to (in literal spatial terms) the vector for “Sister”.
The space of possible minds/algorithms is so vast, and that problem is so open-ended, that it would be a remarkable coincidence if such an AGI had a consciousness that was anything like ours. Most details of our experience are just accidents of evolution and history.
Does an airplane have a consciousness like a bird? “Design an airplane” sounds like a more specific goal, but in the space of all possible minds/algorithms that goal’s solutions are quite undetermined, just like flight.
In case that I get distracted and fail to come back to this, I just want to say that I think this type of project is extremely valuable and should be the main focus of the AI safety/alignment movement (IMHO). From my perspective, most effort seems to be going into writing about arguments for why a future AGI will be a disaster by default, as well as some highly theoretical ideas for preventing this from happening. This type of discourse ignores current ML systems, understandably, because future AGI will be qualitatively different from our current models.
However, the problem with this a priori approach is that it alienates the people working in the industry, who are ultimately the ones we need to be in conversation with if alignment is ever going to become a popular issue. What we really need are experiments like what you’re doing, i.e. actually getting a real AI to do something nasty on camera. This helps us learn to deal with nasty AI, but I think far more importantly it puts the AI safety conversation in the same experimental setting as the rest of the field.
I’m fine with everything on LW ultimately being tied to alignment. Hardcore materialism being used as a working assumption seems like a good pragmatic measure as well. But ideally there should also be room for foundational discussions like “how do we know our utility function?” and “what does it mean for something to be aligned?” Having trapped priors on foundational issues seems dangerous to me.
Work to offer the solutions and let them make their own, informed choice.
The problem is that the bureaucrats who make the decision of whether gene drives are allowed aren’t the same people as the ones who are dying from malaria. Every day that you postpone the eradication of malaria by trying to convince bureaucrats, over a thousand people will die from the disease in question. Most of them, many of whom are infants, had no ability to meaningfully affect their political situation.
Tell me more why you think the impact on society will be positive.
Okay. Let me back up a bit. You had said:
Either there are some concrete things corresponding to personal identity/preference satisfaction/etc., or we value not these things but some actually-existing correlates of these things, or acting like we value these things is a heuristic that instrumentally helps us arrive at good outcomes. Either way, ontology shifts don’t do anything bad to our values.
Leading to this claim, which is very alluring to me, even if I disagree with it:
Or, the other way around, perhaps “values” are defined by being robust to ontology shifts.
You gave the example that us learning that our world is physical didn’t make us value humans less, even if we no longer believe that they have immortal souls. That’s true, we still value humans. But the problem is that the first group of agents (medieval “we”) are a different group than the second group (post-industrial “we”). Each group has a different concept of “human”. The two concepts roughly map to the same targets in the territory, but the semantic meaning is different in a way that is crucial to the first group of agents. The first group weren’t convinced that they were wrong. They simply got replaced by a new group who inherited their title.
That might sound dramatic, but ask a truly committed religious person if they can imagine themselves not believing in God. They will not be able to imagine it. This is because, in fact, they would no longer be the same agency in the sense that is crucial to them. Society would legally consider them the same person, and their apostate future-self might claim the same title as them and even deny that the change really mattered, but the original entity who was asked, the theistic agent, would no longer recognize any heir as legitimate at that point. From its point of view, it has simply died. That is why the theist cannot even imagine becoming an apostate, even if they grant that there’s a greater than zero chance of becoming an apostate at all times.
It’s true what you say, if you’re talking about people i.e. biological human organisms. Such entities will always be doing something in the world up until the very moment they physically expire, including having their whole world view shattered and living in the aftermath of that. However, pre-world-view-shattering them would not recognize post-world-view-shattering them as a legitimate heir. The person might be the same, but it’s a different agency in control.
Similar things can be said about populations and governments.
The suffering and evil present in the world has no bearing on God’s existence. I’ve always failed to buy into that idea. Sure, it sucks. But it has no bearing on the metaphysical reality of a God. If God does not save children—yikes I guess? What difference does it make? A creator as powerful as has been hypothesised can do whatever he wants; any arguments from rationalism be damned.
Of course, the existence of pointless suffering isn’t an argument against the existence of a god. But it is an old argument against the existence of a god who deserves to be worshipped with sincerity. We might even admit that there is a cruel deity, and still say non serviam, which I think is a more definite act of atheism than merely doubting any deity’s existence.
How should this affect one’s decision to specialize in UI design versus other areas of software engineering? Will there be fewer GUIs in the future, or will the “audience” simply cease to be humans?
Great work! I hope more people take your direction, with concrete experiments and monitoring real systems as they evolve. The concern that doing this will backfire somehow simply must be dismissed as untimely perfectionism. It’s too late at this point to shun iteration. We simply don’t have time left for a Long Reflection about AI alignment, even if we did have the coordination to pull that off.
While it’s true that AI alignment raises difficult ethical questions, there’s still a lot of low-hanging fruit to keep us busy. Nobody wants an AI that tortures everyone to death.
Pearce has the idea of “gradients of bliss”, which he uses to try to address the problem you raised about insensitivity to pain being hazardous. He thinks that even if all of the valences are positive, the animal can still be motivated to avoid danger if doing so yields an even greater positive valence than the alternatives. So the prey animals are happy to be eaten, but much more happy to run away.
To me, this seems possible in principle. When I feel happy, I’m still motivated at some low level to do things that will make me even happier, even though I was already happy to begin with. But actually implementing “gradients of bliss” in biology seems like a post-ASI feat of engineering.
(By the way, your idea of predation-induced unconsciousness isn’t one I had heard before, it’s interesting.)
Same. I feel somewhat jealous of people who can have a visceral in-body emotional reaction to X-risks. For most of my life I’ve been trying to convince my lizard brain to feel emotions that reflect my beliefs about the future, but it’s never cooperated with me.
Does anyone know of work dealing with the interaction between anthropic reasoning and illusionism/elimitivism?
the information defining a self preserving agent must not be lost into entropy, and any attempt to reduce suffering by ending a life when that life would have continued to try to survive is fundamentally a violation that any safe ai system would try to prevent.
Very strongly disagree. If a future version of myself was convinced that it deserved to be tortured forever, I would infinitely prefer that my future self be terminated than have its (“my”) new values satisfied.
It will be interesting to see if EA succumbs to rot, or whether its principles are strong enough to scale.
Reading AI safety articles like this one, I always find myself nodding along in agreement. The conclusions simply follow from the premises, and the premises are so reasonable. Yet by the end, I always feel futility and frustration. Anyone who wanted to argue that AI safety was a hopeless program wouldn’t need to look any further than the AI safety literature! I’m not just referring to “death with dignity”. What fills me with dread and despair is paragraphs like this:
Here is the real chasm between the AI safety movement and the ML industry/academia. One field is entirely driven by experimental results; the other is dominated so totally by theory that its own practitioners deny that there can be any meaningful empirical aspect to it, at least, not until the moment when it’s too late to make any difference.
Years ago, I read an article about an RL agent wireheading itself via memory corruption, thereby ignoring its intended task. Either this article exists and I can’t find it now, or I’m misremembering. Either way, it’s exactly the sort of research that the AI safety community should be conducting and publishing right now (i.e. propaganda with epistemic benefits). With things like GPT-3 around nowadays, I bet one could even devise experiments where artificial agents learn to actually deceive humans (via Mechanical Turk, perhaps?). Imagine how much attention such an experiment could generate once journalists pick it up!
EDIT: This post is very close to what I have in mind.