If an AI has superhuman intelligence, it will make all the decisions, since human mind and tech is full of loopholes and exploits. There is just no way we could contain it if it wanted to be free. If it is not significantly smarter than humans, then there is little danger in releasing it. Using an extra AI as a judge of safety can only work if the judge is at least as smart as the prisoner, in which case you need a judge for the judge, ad infinitum. Maybe the judge can be only, say, 90% as smart as the intelligence it needs to decide on, then it might be possible to have a finite number of judges originating from an actual human, depending on how the probability of an error in judgment stacks up against the intelligence ratio at each step. Sort of like iterated amplification, or a blockchain.
Again, anthropics is basically generalizing from one example. Yes, humans have dodged an x-risk bullet a few times so far. There was no nuclear war. The atmosphere didn’t explode when the first nuclear bomb was detonated (something that happens to white dwarfs in binary systems, leading to some supernovae explosion). The black plague pandemic did not wipe out nearly everyone, etc. If we have a reference class of x-risks and assign the probability of a close call p to each member of the class, then all we know is that after observing n close calls the probability of no extinction would be p^n. If the number is vanishingly small, we might want to reconsider our estimate of p (“the world is safer than we thought”). Or maybe the reference class is not constructed correctly. Or maybe we truly got luckier than other hypothetical observable civilizations who didn’t make it. Or maybe quantum immortality is a thing. Or maybe something else. After all, there is only one example, and until we observe some other civilizations actually not making it through, anthropics is groundless theorizing. Maybe we can gain more insights into the reference classes and the probabilities of a close call, and of surviving an even from studying near extinction events roughly fitting into the same reference class (past asteroid strikes, plagues, climate changes, …). However, none of the useful information comes from guessing the size of the universe, of whether we are in a simulation, of “updating based on the fact that we exist” beyond accounting for the close calls and x-risk events.
That said, I certainly agree with your point 4. That only the observed data need to be accounted for.
The Doomsday argument is utter BS because one cannot reliably evaluate probabilities without fixing a probability distribution first. Without knowing more than just the number of humans existing so far, the argument devolves into arguing which probability distribution to pick out of uncountable number of possibilities. An honest attempt to address this question would start with modeling human population fluctuations including various extinction events. In such a model there are multiple free parameters, such as rate of growth, distribution of odds of various extinction-level events, distribution of odds of surviving each type of events, event clustering and so on. The the minimum number of humans does not constrain the models in any interesting way, i.e. to privilege a certain class of models over others, or a certain set of free parameters over others to the degree where we could evaluate a model-independent upper bound for the total number of humans with any degree of confidence.
If you want to productively talk about Doomsday, you have to get your hands dirty and deal with specific x-risks and their effects, not armchair-theorize based on a single number and a few so-called selection/indication principles that have nothing to do with the actual human population dynamics.
A spelling note: It’s Goodhart’s law. A question: do you come across patients with DID, not “just” C-PTSD?
When thinking about how a smarter-than-human AI would treat human input to close the control loop, it pays to consider the cases where humans are that smart intelligence. How do we close the loop when dealing with young children? primates/dolphins/magpies? dogs/cats? fish? insects? bacteria? In all these cases the apparent values/preferences of the “environment” are basically adversarial, something that must be taken into account, but definitely not obeyed. In the original setup a super-intelligent aligned AI’s actions would be incomprehensible to us, no matter how much it would try to explain them to us (go explain to a baby that eating all the chocolate it wants is not a good idea, or to a cat that their favorite window must remain closed). Again, in the original setup it can be as drastic as an AI culling the human population, to help save us from a worse fate, etc. Sadly, this is not far from the “God works in mysterious ways” excuse one hears as a universal answer to the questions of theodicy.
Those are all pretty opaque, as in, their inner workings are not immediately obvious, so it’s natural to take the intentional stance toward them. I had in mind something much simpler. For example, does an algorithm that adds two numbers have a belief about the rules of addition? Does a GIF to JPEG converter have a belief about which image format is “better”?
Can you describe a simple non-opaque algorithm to which you can meaningfully ascribe a belief?
Seems people are really confused about the statements made in the linked post.
1. Individual elementary particles have no internal states beyond mass/charge/spin etc. and therefore are incapable of thinking, feeling or suffering, as this would require, at a minimum, processes changing those states.
2. Human-shaped collections of underlying physical constituents clearly have the sensations of feeling something.
3. We do not currently know where the boundary is, as in, what kind of physical or logical structures are needed to support qualia.
I suspect the situation is vastly more complicated than that. Revealed preferences show the contradiction between stated preferences and actions. Misaligned incentives models (a part of) a person as a separate agent with distinct short-term goals. But humans are not modeled well as a collection of agents. We are a messy result of billions of years of evolution, with some random mutations becoming meta-stable through sheer chance. All human behavior is a side effect of that. Certainly both RPT and MIT can be a rough starting point, and if someone actually numerically simulates human behavior, the two could be some of the algorithms to use. But I am skeptical they together would explain/predict a significant fraction of what we do.
Number two is sometimes known as the Adams Law of Slow-Moving Disasters:
whenever humanity can see a slow-moving disaster coming, we find a way to avoid it
In general, however, it seems that you believe something you want to believe and find justifications for this belief, because it is more comfortable to think that things will magically work out. Eliezer wrote at length about this failure mode.
A short answer is “Don’t worry about Vitamin D unless you have a deficiency, avoiding UV damage to skin is more important”. This doesn’t necessarily mean sunscreen, could be taking a siesta during the peak UV times, wearing a proper hat and clothing etc. Something basic like staying out of the direct sunlight between 10am and 3pm ( more if you are in tropics or subtropics) would give you an approximate balance. If in doubt, there are plenty of UV meters on the market that would help you be more quantitative.
Here is another link: https://www.statnews.com/2016/09/23/end-of-life-dying-willpower/ that is more (anecdotal) evidence. Biologically, there is nothing magical about consciously pushing your body to stay alive for a little while longer if the situation is not acute, but chronic. Same with consciously letting go. Release of adrenaline or other hormones triggered by intense emotions happens all the time, and I don’t see why end-of-life would be an exception.
Scott Aaronson wrote about it last year, and it might count as an answer, especially the last paragraph:
I suppose this is as good a place as any to say that my views on AI risk have evolved. A decade ago, it was far from obvious that known methods like deep learning and reinforcement learning, merely run with much faster computers and on much bigger datasets, would work as spectacularly well as they’ve turned out to work, on such a wide variety of problems, including beating all humans at Go without needing to be trained on any human game. But now that we know these things, I think intellectual honesty requires updating on them. And indeed, when I talk to the AI researchers whose expertise I trust the most, many, though not all, have updated in the direction of “maybe we should start worrying.” (Related: Eliezer Yudkowsky’s There’s No Fire Alarm for Artificial General Intelligence.)
Who knows how much of the human cognitive fortress might fall to a few more orders of magnitude in processing power? I don’t—not in the sense of “I basically know but am being coy,” but really in the sense of not knowing.
To be clear, I still think that by far the most urgent challenges facing humanity are things like: resisting Trump and the other forces of authoritarianism, slowing down and responding to climate change and ocean acidification, preventing a nuclear war, preserving what’s left of Enlightenment norms. But I no longer put AI too far behind that other stuff. If civilization manages not to destroy itself over the next century—a huge “if”—I now think it’s plausible that we’ll eventually confront questions about intelligences greater than ours: do we want to create them? Can we even prevent their creation? If they arise, can we ensure that they’ll show us more regard than we show chimps? And while I don’t know how much we can say about such questions that’s useful, without way more experience with powerful AI than we have now, I’m glad that a few people are at least trying to say things.
But one more point: given the way civilization seems to be headed, I’m actually mildly in favor of superintelligences coming into being sooner rather than later. Like, given the choice between a hypothetical paperclip maximizer destroying the galaxy, versus a delusional autocrat burning civilization to the ground while his supporters cheer him on and his opponents fight amongst themselves, I’m just about ready to take my chances with the AI. Sure, superintelligence is scary, but superstupidity has already been given its chance and been found wanting.
The snake reassures them that because a pregnancy would lead to billions of descendants, SSA’s preferences for small universes means that this is almost impossibly unlikely, so, time to get frisky.
I cannot fathom what misuse of logical reasoning can lead to someone being taken by this argument.
I agree that states and processes match the underlying physical and biological mechanisms better. The AI researchers tend to violently ignore this level of reality however, as it does not translate well into the observable behaviors of what looks to us like agents making decisions based on hypotheticals and counterfactuals. I’ve posted about it here multiple times before. I am skeptical you will get more traction.
if you insisted that the “right” answer is that A and B are the same color, you would (as I claim, and explain, above) be wrong—or, more precisely, you would be right in a useless and irrelevant way, but wrong in the important and practical (but still quite specific) way. Now, how sure are you that the same isn’t true in the happiness case?
An excellent point! I wonder how one can figure out this distinction in the hedonic case.
Well, first, you are an expert in the area, someone who probably put 1000 times more effort into figuring things out, so it’s unwise for me to think that I can say anything interesting to you in an area you have thought about. I have been on the other side of such a divide in my area of expertise, and it is easy to see a dabbler’s thought processes and the basic errors they are making a mile away. Bur since you seem to be genuinely asking, I will try to clarify.
At some point a human is going to enter a command or press a button that causes code to start running. That human is going to know that an AI system has been created. (I’m not arguing that all humans will know that an AI system has been created,
Right, those who are informed, would know. Those who are not informed may or may not figure it out on their own, and with minimal effort the AI hand can probably be masked as a natural event. Maybe I misinterpreted your point. Mine was that, just like an E.coli would not recognize an agent, neither would humans if it wasn’t something we are already primed to recognize.
My other point was indeed nto a nitpick, more about a human-level AI requiring a reasonable formalization of the game of human interaction, rather than any kind of a new learning mechanism, those are already good enough. Not an AGI, but a domain AI for a specific human domain that is not obviously a game. Examples might be a news source, an emotional support bot, a science teacher, a poet, an artist…
They nonetheless contain useful information, in a way that E coli may not. See for example Inverse Reward Design.
Interesting link, thanks! Right, the information can be useful, even if not truthful, as long as the asker can evaluate the reliability of the reply.
Hmm, if you ask Scott directly, odds are, he will reply to you :)
I didn’t claim that the quoted passage universalizes as a version of negative utilitarianism in all imaginable cases, just that it makes sense intuitively in a variety of real-life situations as well as in the many cases not usually considered, like the ones you mentioned, or in case of reversible destruction Scott talks about, or human cloning, or…
And we can see that in your constructed setup the rationale for preserving the variety “it deprives the rest of society of a unique, irreplaceable store of knowledge and experiences” no longer holds.