Well, if you or I were to suggest that the best way to achieve universal human happiness was to forcibly rewire the brain of everyone on the planet so they became happy when sitting in bottles of dopamine, most other human beings would probably take that as a sign of insanity.
Richard, I know (real!) people who think that wireheading is the correct approach to life, who would do it to themselves if it were feasible, and who would vote for political candidates if they pledged to legalize or fund research into wireheading. (I realize this is different from forcible wireheading, but unless I’ve misjudged your position seriously I don’t think you see the lack of consent as the only serious issue with that proposal.)
I disagree with those people; I don’t want to wirehead myself. But I notice that I am uncertain about many issues:
Should they be allowed to wirehead? Relatedly, is it cruel of me to desire that they not wirehead themselves? Both of these issues are closely related to the issue of suicide—I do, at present, think it should be legal for others to kill themselves, and that it would be cruel of me to desire that they not kill themselves, rather than desiring that they not want to kill themselves.
Are there philosophical foundations that I could rely on to convince them that, while they might want to wirehead now, after reflecting on those foundations, they would not want to wirehead? Do I endorse those foundations for myself, or am I mistaken in not wanting to wirehead?
How should my feelings on wireheading impact my feelings on related issues, like video gaming? I definitely became much more ambivalent about my video gaming hobby when I saw the parallels to wireheading, and the generalized case of superstimuli deserves philosophical attention. But it could be instead that, by endorsing video gaming (at least for others), that means I should also endorse wireheading, as just the best possible video game.
It seems to me that, rather than acknowledging that these questions are serious and unanswered, and thus seeking to find answers, you instead seek to dismiss those questions as not worth asking, and thus not worth answering. I feel like this is the wrong approach to philosophy, for many reasons.
I also suspect that this attitude generalizes: there are many places where I see Yudkowsky and similar thinkers saying “hold on a minute, what feature of reality can we point to that generates X?” and you seem to be responding with “but clearly X will be generated,” or “but superintelligence will generate X.” Is it actually clear? Will it be obvious in prospect that an AI design is superintelligent at determining what we want, rather than just obvious in retrospect? What features of the AI design should we be looking for?
For example, you point out that Swarm Relaxation Intelligences (SRIs) lack some dangerous features of CLAIs. This seems like a move in the right direction, but a proof of safety is not just the absence of known errors. It seems wise to let safety and verifiability drive our design choices, and to develop a strong understanding of what features lead to what guarantees (or, at least, what evidence in favor of safety or alignment).
You make a valid point—and one worth discussing at length, sometime—but the most important thing right now is that you have misunderstood my position on the question.
First of all, there is a very big distinction between a few people (or even the whole population!) making a deliberate choice to wirehead, and the nanny AI deciding to force everyone to wirehead because that is its interpretation of “making humans happy” (and doing so in a context in which those humans do not want to do it).
You’ll notice that in the above quote from my essay, I said that most people would consider it a sign of insanity if a human being were to suggest forcing ALL humans to wirehead, and doing so on the grounds that this was the best way to achieve universal human happiness. If that same human were to suggest that we should ALLOW some humans to wirehead if they believed it would make them happy, then I would not for a moment label that person insane, and quite a lot of people would react the same way.
So I want to be very clear: I very much do acknowledge that the questions regarding various forms of voluntary wireheading are serious and unanswered. I’m in complete agreement with you on that score. But in my paper I was talking only about the apparent contradiction between (a) forcing people to do something as they screamed their protests, while claiming that those people had asked for this to be done, and (b) an assessment that this behavior was both intelligent and sane. My claim in the above quote was that there is a prima facie case to be made, that the proper conclusion is that the behavior would indeed not be intelligent and sane.
(Bear in mind that the quote was just a statement of a prima facie. The point was not to declare that the AI really is insane and/or not intelligent, but to say that there are grounds for questioning. Then the paper goes on to look into the whole problem in more detail. And, most important of all, I am trying to suggest that responding to this my simply declaring that the AI has a ‘different’ type of intelligence, or even a superior type of ‘intelligence’ would be a glaring example of sweeping the prima facie case straight into the trashcan without even looking at it.)
You go on to make a different point:
I also suspect that this attitude generalizes: there are many places where [...] you seem to be responding with “but clearly X will be generated,” or “but superintelligence will generate X.” Is it actually clear? Will it be obvious in prospect that an AI design is superintelligent at determining what we want, rather than just obvious in retrospect? What features of the AI design should we be looking for?
Well, I hope I have reassured you that, even if I do do that, it would not be a generalization of my attitude.
But now, do I do that? I try really hard not to take anything for granted and simply make an appeal to the obviousness of any idea. So you will have to give me some case-by-case examples if you think I really have done that.
You mention only one, and it is a biggie.
The whole issue of proving safety is deeply problematic (a whole essay in itself). I tried to talk about it in the above, but there was not enough space to develop it fully.
The basic points are these:
1) Proof-of-correctness techniques can indeed help with some things, but as Selmer Bringsjord found out to his embarrassment at the 2009 AGI conference, there is a problem if anyone thinks that proofs can be had in high-complexity situations. As I explained at that time, there are two things that go into a proof-of-correctness machine: (a) the target whose correctness is to be proved, and (b) a specification of what qualifies as correctness. In those cases where the specification of what qualifies is simply a list of syntactic rules that must be followed, this approach is valid—but when the specification of what qualifies as “correct” becomes huge (for example, it involves a massive, open-ended stipulation of the full meaning of “moral behavior” or “friendliness toward humanity”), the specification ITSELF will have bugs in it, and the specification will be of the same magnitude as the target! This means that the specification needs to be checked for correctness first … and you can see that this quickly leads to an infinite regress.
What this means is that when you point to the security engended by type-checking in functional programming, you seem to be implying that we really need something like that for checking or proving the safety of AGI systems, and really that is never going to be possible. Yes, certain classes of errors can be guaranteed not to occur in some types of programming, but that will never generalize to systems in which the specification of what counts as correct is large. That is a dream that we need to let go of.
(I started writing a paper about that a few weeks ago, in fact, but I stopped doing it because someone pointed out that MIRI no longer even claims that a rigorous proof of Friendliness is even possible. My understanding is, then, that this point I just made is now generally accepted.)
2) However, statistically-oriented ‘proofs’ can be useful in systems where the overall behavior is caused by a large ensemble of interacting processes, and where the overall behavior cannot be hijacked by any one of the atoms in that ensemble. To wit: we can make dramatically precise statements about the ensemble properties of thermodynamic systems, even though the individual atoms can do whatever they like.
It is that second type of proof-of-friendliness (it is not really a ‘proof’ it is only a statistical argument) is what I was alluding to when I talked about Swarm Relaxation systems.
So, I don’t think I was just assuming that Swarm Relaxation could lead to stronger claims about friendliness, I was basing the claim on a line of reasoning.
But now, do I do that? I try really hard not to take anything for granted and simply make an appeal to the obviousness of any idea. So you will have to give me some case-by-case examples if you think I really have done that.
So, on rereading the paper I was able to pinpoint the first bit of text that made me think this (the quoted text and the bit before), but am having difficulties finding a second independent example, and so I apologize for the unfairness in generalizing based on one example.
The other examples I found looked like they all relied on the same argument. Consider the following section:
The objection I described in the last section has nothing to do with anthropomorphism, it is only about holding AGI systems to accepted standards of logical consistency, and the Maverick Nanny and her cousins contain a flagrant inconsistency at their core.
If I think the “logical consistency” argument does not go through, I shouldn’t claim this is an independent argument that doesn’t go through, because this argument holds given the premises (at least one of which I reject, but it’s the same premise). I clearly had this line in mind also:
for example, when it follows its compulsion to put everyone on a dopamine drip, even though this plan is clearly a result of a programming error
The ‘principal-agent problem’ is a fundamental problem in human institutional design: principals would like to be able to hire agents to perform tasks, but only have crude control over the incentives of the agents, and the agents often have control over what information makes it to the principals. One way to characterize the AI value alignment problem (as I hear MIRI is calling it these days) is that it’s a principal agent problem where the agent has massive control over the information the principal sees, but the value difference between principals and agents is only due to communication problems, rather than any malice on the part of the agent. That is, the principal wants the agent to do “what I mean,” but the agent only has access to “what I say,” and cannot be assumed to have any mind-reading powers that we don’t build into it.
It seems very difficult to get an AI to correctly classify the difference between programming error and programming intention, and even more difficult for the AI to communicate to us that it has correctly classified that issue. (We have both the illusion of transparency, and the double illusion of transparency to deal with!) Claiming that something is “clearly” a programming error strikes me as trivializing the underlying communication problem. But I agree with you that if we have that problem solved, then we’re home free.
I just wanted to say that I will try to reply soon. Unfortunately :-) some of the comments have been intensely thoughtful, causing me to write enormous replies of my own and saturating my bandwidth. So, apologies for any delay....
Some thoughts in response to the above two comments.
First, don’t forget that I was trying to debunk a very particular idea, rather than other cases. My target was the idea that a future superintelligent AGI could be programmed to have the very best of intentions, and it might claim to be exercising the most extreme diligence in pursuit of human happiness, while at the same time it might think up a scheme that causes most of humanity to scream with horror while it forces the scheme on those humans. That general idea has been promoted countless times (and has been used to persuade people like Elon Musk and Stephen Hawking to declare that AI could be cause the doom of the human race), and it is has also been cited as an almost inevitable end point of the process of AGI development, rather than just a very-low-risk possibility with massive consequences.
So, with that in mind, I can say that there are many points of agreement between us on the subject of all those cases that you brought up, above, where there are ethical dilemmas of a lesser sort. There is a lot of scope for us having a detailed discussion about all of those dilemmas—and I would love to get into the meat of that discussion, at some point—but that wasn’t really what I was trying to tackle in the paper itself.
(One thing I could say about all those cases is that if the AGI were to “only” have the same dilemmas that we have, when trying to figure out the various ethical condundrums of that sort, then we are no worse off than we are now. Some people use the inability of the AGI to come up with optimal solutions in (e.g.) Trolley problems as a way to conclude that said AGIs would be unethical and dangerous. I strongly disagree with those who take that stance).
Here is a more important comment on that, though. Everything really comes down to whether the AGI is going to be subject to bizarre/unexpected failures. In other words, it goes along perfectly well for a long time, apparently staying consistent with what we’d expect of an ethically robust robot, and then one day it suddenly does something totally drastic that turns out to have been caused by a peculiar “edge case” that we never considered when we programmed it. (I am reminded of IBM’s Watson answering so many questions correctly on Jeopardy, and then suddenly answering the question “What do grasshoppers eat?” with the utterly stupid reply “Kosher.”).
That issue has been at the core of the discussion I have been having with Jessicat, above. I won’t try to repeat all of what I said there, but my basic position is that, yes, that is the core question, and what i have tried to do is to explain that there is a feasible way to address precisely that issue. That was what all the “Swarm Relaxation” stuff was about.
Finally, to reply to your statement that you think the “logical consistency” idea does not go through, can I ask that you look at my reply to “misterbailey”, elsewhere in the comments? He asked a question, so I tried to clarify exactly where the logical inconsistency was located. Apparently he had misunderstood what the inconsistency was supposed to be. It might be that the way I phrased it there could shed some light on your disagreement with it. Let me know if it does.
it is has also been cited as an almost inevitable end point of the process of AGI development, rather than just a very-low-risk possibility with massive consequences.
I suspect this may be because of different traditions. I have a lot of experience in numerical optimization, and one of my favorite optimization stories is Dantzig’s attempt to design an optimal weight-loss diet, recounted here. The gap between a mathematical formulation of a problem, and the actual problem in reality, is one that I come across regularly, and I’ve spent many hours building bridges over those gaps.
As a result, I find it easy to imagine that I’ve expressed a complicated problem in a way that I hope is complete, but the optimization procedure returns a solution that is insane for reality but perfect for the problem as I expressed it. As the role of computers moves from coming up with plans that humans have time to verify (like delivering a recipe to Anne, who can laugh off the request for 500 gallons of vinegar) to executing actions that humans do not have time to verify (like various emergency features of cars, especially the self-driving variety, or high-frequency trading), this possibility becomes more and more worrying. (Even when humans do verify the system, the more trustworthy the system, the more the human operator will trust it—and thus, the more likely that the human will fail to catch a system error.)
Similarly, one might think of another technological field whose history is more mature, yet still recent: aircraft design. The statement “that airplanes will crash is almost inevitable” seems more wise than not to me—out of all of the possible designs that you or I would pattern-match to an “airplane,” almost all of them crash. Unfortunately, designs that their designer is sure will work still crash sometimes. Of course, some airplanes work, and we’ve found those designs, and a hundred years into commercial air travel the statement that crashes are almost inevitable seems perhaps silly.
So just like we might acknowledge that it’s difficult to get a plane that flies without crashing, and it’s also difficult to be sure that a design will fly without crashing without testing it, it seems reasonable to claim that it will also be difficult for AGI design to operate without unrecoverable mistakes—but even more so.
Everything really comes down to whether the AGI is going to be subject to bizarre/unexpected failures.
I agree with this, and further, I agree that concepts encoded by many weak constraints will be more robust than concepts encoded by few hard constraints, by intuitions gained from ensemble learners.
I might elaborate the issue further, by pointing out that there is both the engineering issue, of whether or not it fails gracefully in all edge cases, and the communication issue, of whether or not we are convinced that it will fail gracefully. Both false positives and false negatives are horrible.
Finally, to reply to your statement that you think the “logical consistency” idea does not go through, can I ask that you look at my reply to “misterbailey”, elsewhere in the comments? He asked a question, so I tried to clarify exactly where the logical inconsistency was located. Apparently he had misunderstood what the inconsistency was supposed to be. It might be that the way I phrased it there could shed some light on your disagreement with it. Let me know if it does.
I don’t think I agree with point 1 that you raise here:
1) Conclusions produced by my reasoning engine are always correct. [This is the Doctrine of Logical Infallibility]
I think that any active system has to implicitly follow a doctrine that I’ll state as “I did the best I could have, knowing what I did then.” That is, I restate your (2) as the system knowing that, living in an uncertain universe and not having logical omniscience, it will eventually make mistakes. Perhaps in response it will shut itself down, and become an inactive system (this is the inconsistency that I think you’re pointing at). Or perhaps it will run the numbers and say “it’s better to try something than do nothing, even after taking the risk of mistakes into account.”
Now, of course, this isn’t an imperative to always act immediately without consideration. Oftentimes, the thing to try is “wait and think of a better plan” or “ask others if this is a good idea or not,” but the problem of discernment shows up again. What logic is the system using to determine when to stop thinking about new plans? What logic is it using to determine whether or not to ask for advice? If it could predict what mistakes it would make ahead of time, it wouldn’t make them!
To go back to my analogy of aircraft design: suppose we talked about the “quality” of aircraft, which was some overall gestalt of how well they flew, and we eagerly looked forward to days when aircraft designs become better.
At one point, the worry is raised that future aircraft designs might be too high quality. On the face of it, this sounds ridiculous: how could it be that increasing the quality of the aircraft makes it less safe or desirable? Further elaboration reveals that there are two parts of aircraft design: engines and guidance systems. If engines grow much more powerful, but guidance systems remain the same, the aircraft might become much less safe—a tremble at the controls, and now the aircraft is spinning madly.
Relatedly, I found your “Because intelligence” section unsatisfying, because it seems like it’s resisting that sort of separation—separating ‘intelligence’ into, say, ‘cleverness’ (the ability to find a plan that achieves some consequence) and ‘wisdom’ (the ability to determine the consequences of a plan, and the desirability of those consequences) seems helpful when talking about designing intelligent agents.
I think Eliezer and others point out that systems that are very clever but not very wise are very dangerous and that cleverness and wisdom are potentially generated by different components. It seems to me that your models of intelligence have a deeper connection between cleverness and wisdom, and so you think it’s considerably less likely that we’ll get that sort of dangerous system that is clever but not wise.
the most important thing right now is that you have misunderstood my position on the question.
Thanks for the clarification!
But in my paper I was talking only about the apparent contradiction between (a) forcing people to do something as they screamed their protests, while claiming that those people had asked for this to be done, and (b) an assessment that this behavior was both intelligent and sane.
I’m glad to hear that we’re focusing on this narrow issue, so let me try to present my thoughts on it more clearly. Unfortunately, this involves bringing up many individual examples of issues, none of which I particularly care about; I’m trying to point at the central issue that we may need to instruct an AI how to solve these sorts of problems in general, or we may run into issues where an AI extrapolates its models incorrectly.
When people talk about interpersonal ethics, they typically think in terms of relationships. Two people who meet in the street have certain rules for interaction, and teachers and students have other rules for interaction, and doctors and patients other rules, and so on. When considering superintelligences interacting with intelligences, the sort of rules we will need seem categorically different, and the closest analogs we have now are system designers and engineers interacting with users.
When we consider people interacting with people, we can rely on ‘informed consent’ as our gold standard, because it’s flexible and prevents most bad things while allowing most good things. But consent has its limits; society extends children only limited powers of consent, reserving many (but not all) of them for their parents; some people are determined mentally incapable, and so on. We have complicated relationships where one person acts in trust for another person (I might be unable to understand a legal document, but still sign it on the advice of my lawyer, who presumably can understand that document, or be unable to understand the implications of undergoing a particular medical treatment, but still do it on the advice of my doctor), because the point of those relationships is that one person can trade their specialized knowledge to another person, but the second person is benefited by a guarantee the first person is actually acting in their interest.
We can imagine a doctor wireheading their patient when the patient did not in fact want to be wireheaded, for a wide variety of reasons. I’m neither a doctor nor a lawyer, so I can’t tell you what sort of consent a doctor needs to inject someone with morphine—but it seems to me that sometimes things will be uncertain, and the doctor will drug someone when a doctor with perfect knowledge would not have, but we nevertheless endorse that decision as the best one the doctor could have made at the time.
But informed consent starts being less useful when we get to systems. Consider a system that takes biomedical data from every person in America seeking an organ transplant, and the biomedical data from donated organs as they become available, and matches organs to patients. Everyone involved likely consented to be involved (or, at least, didn’t not consent if there’s an opt-out organ donation system), but there are still huge ethical questions remaining to be solved. What tradeoffs are we willing to accept between maximizing QALYs and ensuring equity? How much biomedical data can we use to predict the QALYs from any particular transplant, and what constitutes unethical discrimination?
It seems unlikely to me that the main challenge of superintelligence is that a superintelligence will force or trick us into doing things that it is obvious that we don’t want to do them. It seems likely to me that the main challenge is that it will set up systems, potentially with some form of mandatory participation, and we thus need to create a generalized system architect that can solve those ethical, moral, and engineering problems for us while designing arbitrary systems, without us having coded in the exact solutions.
Notice also that consent applies to individual rights, not community rights, but many people’s happiness and livelihoods may rest on community rights. There are already debates over whether or not deafness should be cured: how sad to always be the youngest deaf person, or for deaf culture to disappear, but to avoid that harm, we need some people to be deaf instead of hearing, which is its own harm. Managing this in a way that truly maximizes human flourishing seems like it requires a long description.
Many human ethical problems are solved by rounding small numbers to zero, but superintelligences represent the ability to actually track those small numbers, which means entire legal categories that rest on a sharp divide between 0 and 1 could become smooth. For example, consider ‘sexual harassment’ defined as ‘unwanted advances.’ Should a SI censor any advances it thinks that the receiver will not want, or is that taking sovereignty from the receiver to determine whether or not they want any advance?
My understanding is, then, that this point I just made is now generally accepted.
Right, and I agree with it as well. I think the remaining useful insight of functional programming is that minimizing side effects increases code legibility, and if we want to be confident in the reasoning of an AI system (or an AI system wants to be able to confidently predict the impact of a proposed self-modification) we likely want the code to be partitioned as neatly as possible, so any downstream changes or upstream dependencies can be determined simply.
Neural nets, and related systems, do not have a preference for legibility built into the underlying structure, and so we may not want to use them or related systems for goal management code, or take especial care when connecting them to goal management code.
where the overall behavior cannot be hijacked by any one of the atoms in that ensemble.
Hmm. I’m going to have to think about this claim longer, but right now I disagree. I think the model of human reasoning that seems most natural to me is hierarchical control systems. When I think of “swarms,” that implies to me some sort of homogeneity between the agents, such as might describe groups of humans but not necessarily individual humans. (If we consider humans as swarms of neurons, which is how I originally read your statement, then the ‘swarm-like’ properties map on the control loops of the hierarchical view.)
But it seems to me that if the atoms in the swarm have specialized roles (like neurons), then a small number of atoms behaving strangely can lead to the swarm behaving strangely. (This is easier to see in the controls case, but I think is also sensible in the swarm of neurons model.) I’m thinking of the various extreme cases of psychology as examples: stuff like destroying parts of cats’ brains and seeing how they behave, or the various exotic psychological disorders whose causes can be localized in the brain, or so on. That a system is built out of many subcomponents, instead of being a single logical reasoner, does not seem to me to be a significant source of safety.
(Now, I do think that various ‘moral congress’ ideas might represent some sort of safety—if you need many value systems to all agree that something is a good idea, then it seems less likely that extreme alternatives will be chosen in exotic scenarios, and a single value system can be a composite of many simpler value systems. This is ideas like ‘bagging’ from machine learning applied to goals—but the gains seem minor to me.)
the security engended by type-checking in functional programming
No. Dependently-typed theorem proving is the only thing safe enough ;-). That, or the kind of probabilistic defense-in-depth that comes from specifying uncertainty about the goal system and other aspects of the agent’s functioning, thus ensuring that updating on data will make the agent converge to the right thing.
with functional programs, it is possible to ensure through type-checking that certain classes of errors cannot occur. With imperative programs, all testing can do is ensure the presence of errors (with absence of evidence being evidence of absence—but not proof of absence).
This doesn’t appear to be correct given that you can always transform functional programs into imperative programs and vice versa.
I’ve never heard that you can program in functional languages without doing testing and relying only on type checking to ensure correct behavior.
In fact, AFAIK, Haskell, the most popular pure functional programming language, is bad enough that you actually have to test all non-trivial programs for memory leaks, since it is not possible to reason except for special cases about the memory allocation behavior of a program from its source code and the language specification: the allocation behavior depends on implementation-specific and largely undocumented details of the compiler and the runtime. Anyway, this memory allocation issue may be specific to Haskell, but in general, as I understand, there is nothing in the functional paradigm that guarantees a higher level of correctness than the imperative paradigm.
I’ve never heard that you can program in functional languages without doing testing and relying only on type checking to ensure correct behavior.
“Certain classes of errors” is meant to be read as a very narrow claim, and I’m not sure that it’s relevant to AI design / moral issues. Many sorts of philosophical errors seem to be type errors, but it’s not obvious that typechecking is the only solution to that. I was primarily drawing on this bit from Programming in Scala, and in rereading it I realize that they’re actually talking about static type systems, which is an entirely separate thing. Editing.
Verifiable properties. Static type systems can prove the absence of certain run-time errors. For instance, they can prove properties like: booleans are never added to integers; private variables are not accessed from outside their class; functions are applied to the right number of arguments; only strings are ever added to a set of strings.
Other kinds of errors are not detected by today’s static type systems. For instance, they will usually not detect non-terminating functions, array bounds violations, or divisions by zero. They will also not detect that your program does not conform to its specification (assuming there is a spec, that is!). Static type systems have therefore been dismissed by some as not being very useful. The argument goes that since such type systems can only detect simple errors, whereas unit tests provide more extensive coverage, why bother with static types at all? We believe that these arguments miss the point. Although a static type system certainly cannot replace unit testing, it can reduce the number of unit tests needed by taking care of some properties that would otherwise need to be tested. Likewise, unit testing can not replace static typing. After all, as Edsger Dijkstra said, testing can only prove the presence of errors, never their absence.[14] So the guarantees that static typing gives may be simple, but they are real guarantees of a form no amount of testing can deliver.
This doesn’t appear to be correct given that you can always transform functional programs into imperative programs and vice versa.
The relevant difference is in isolation and formulation of side effects, which encourages formulation of more pieces of code whose behavior can be understood precisely in most situations. The toolset of functional programming is usually better for writing higher order code that keeps the sources of side effects abstract, so that they are put back separately, without affecting the rest of the code. As a result, a lot of code can have well-defined behavior that’s not disrupted by context in which it’s used.
This works even without types, but with types the discipline can be more systematically followed, sometimes enforced. It also becomes possible to offload some of the formulation-checking work to a compiler (even when the behavior of a piece of code is well-defined and possible to understand precisely, there is the additional step of making sure it’s used appropriately).
I’ve never heard that you can program in functional languages without doing testing and relying only on type checking to ensure correct behavior.
See Why Haskell just works. It’s obviously not magic, the point is that enough errors can be ruled out by exploiting types and relying on spare use of side effects to make a difference in practice. This doesn’t ensure correct behavior (for example, Haskell programs can always enter an infinite loop, while promising to eventually produce a value of any type, and Standard ML programs can use side effects that won’t be reflected in types). It’s just a step in the right direction, when correctness is a priority. There is also a prospect that more steps in this direction might eventually get you closer to correctness.
I used to love functional programming and the elegance of e.g. Haskell, until I realized functional programming has the philosophy exactly backwards. You want to make it easy for humans and hard for machines, not vice versa.
Human think causally, e.g. imperatively and statefully. When humans debug functional/lazy programs, they generally smuggle in stateful/causal thinking to make progress. This is a sign something is going wrong with the philosophy.
Hm. The primary reason I got interested in fp is that I really like SQL, I think it is very easy for the human mind. And LINQ is built on top of functional programming, the Gigamonkeys book buils a similar query language on top of functional programming and macros, so it seems perhaps fp should be used that way, taking it as far as possible towards making query languages in it.
But I guess it always depends on what you want to do. My philosophy of programming is automation based. That means, if I need to do something once, I do it by hand, if a thousand times I write code. This, the ability to repeat operations many times, is what makes automating human work possible and from this I derived that the most important imperative structure is the loop. The loop is what turns something that was a mere set of rules into a powerfol data processing machinery, doing an operation many more times than I care to do it. With SQL, LINQ and other queries, we are essentially optimizing the loop as such. For example the generator expression in Python is a neat little functional loop-replacement, mini-query language.
The relevant difference is in isolation and formulation of side effects, which encourages formulation of more pieces of code whose behavior can be understood precisely in most situations. The toolset of functional programming is usually better for writing higher order code that keeps the sources of side effects abstract, so that they are put back separately, without affecting the rest of the code. As a result, a lot of code can have well-defined behavior that’s not disrupted by context in which it’s used.
Yes, that’s how it was intended to be and how they spin it, but in practice the abstraction is leaky and it leaks in bad, difficult to predict ways therefore, as I said, you end up with things like having to test for memory leaks, something that is usually not an issue in “imperative” languages like Java, C# or Python.
I like the functional paradigm inside a good multi-paradigm language: passing around closures as first-class objects is much cleaner and concise than fiddling with subclasses and virtual methods, but forcing immutability and lazy evaluation as the main principles of the language doesn’t seem to be a good design choice. It forces you to jump through hoops to implement common functionality like interaction, logging or configuration, and in return it doesn’t deliver the higher modularity and intelligibility that were promised.
Agreed. Abstractions are still leaky, and where some pathologies in abstraction (i.e. human-understandable precise formulation) can be made much less of an issue by using the functional tools and types, others tend to surface that are only rarely a problem for more concrete code. In practice, the tradeoff is not one-sided, so its structure is useful for making decisions in particular cases.
Should they be allowed to wirehead? Relatedly, is it cruel of me to desire that they not wirehead themselves?
Shouldn’t we just offer them a superior alternative?
Are there philosophical foundations that I could rely on to convince them that, while they might want to wirehead now, after reflecting on those foundations, they would not want to wirehead? Do I endorse those foundations for myself, or am I mistaken in not wanting to wirehead?
You can’t imagine anything superior to wireheading? Sad. (Edit: What!? Come on, downvoters: the entire Fun Theory Sequence was written on the idea that there are strictly better things to do with life than nail your happy-dial at maximum. Disagree if you like, but it’s not exactly an unsupported opinion.)
How should my feelings on wireheading impact my feelings on related issues, like video gaming?
Wait. How are those two the same thing? You can criticize games for being escapist, but then you have to ask: escape from what, to what? What sort of “real life” (ie: all of real life aside from video games, since games are a strict subset of real life) are you intending to state is strictly superior in all cases to playing video games?
You can’t imagine anything superior to wireheading? Sad.
What I cannot imagine at present is an argument against wireheading that reliably convinces proponents of wireheading. As it turns out, stating their position and then tacking “Sad” to the end of it does not seem to reliably do so.
How are those two the same thing?
Obviously they are not the same thing. From the value perspective, one of them looks like an extreme extension of the other; games are artificially easy relative to the rest of life, with comparatively hollow rewards, and can be ‘addictive’ because they represent a significantly tighter feedback loop than the rest of life. Wireheading is even easier, even hollower, and even tighter. So if I recoil from the hollowness of wireheading, can I identify a threshold where that hollowness becomes bad, or should it be a linear penalty, that I cannot ignore as too small to care about when it comes to video gaming? (Clearly, penalizing gaming does not mean I cannot game at all, but it likely means that I game less on the margin.)
What you need here is to unpack your definition of “hollow”.
Let’s go a little further along the spectrum from culturally mainstream definitions of “hollow” to culturally mainstream definitions of “meaningful”.
My hobby is learning Haskell. In fact, just a couple of minutes ago I solved a challenge on HackerRank—writing a convex-hull algorithm in Haskell. This challenged me, and was fun for a fair bit. However, Haskell isn’t my job, and I don’t particularly want a job writing Haskell, nor do I particularly care—upon doing the degree of conscious reflection involved in asking, “Should I spend effort going up a rank on HackerRank, or taking a walk outside in the healthy fresh air?”—about the gamified rewards on HackerRank. From the “objective” point of view, in which my actions are “meaningful” and “non-hollow” when they serve the supergoals of some agent containing me, or some optimization process larger than me (ie: when they serve God, the state, my workplace, academia, humanity, whatever), learning Haskell is almost, but not quite, entirely pointless.
And yet I bet you would still consider it more meaningful and less pointless than a video game, or eating dessert, or anything else done purely for fun.
So again: let’s unpack. I am entirely content to pursue reflectively-coherent fun that is tied up with the rest of reality around me. I can trade off and do Haskell instead of gaming because Haskell is more tied up with the rest of reality around me than gaming. I could also trade off the other way around, as I might if I, for instance, joined a weekly D&D play group. But what I am personally choosing to pursue is reflectively-coherent fun that’s tied up with the rest of reality, not Usefulness to the Greater Glory of Whatever.
Problem is, Usefulness to the Greater Glory of Whatever is, on full information and reflection, itself entirely empty. There is no Greater Whatever. There’s no God, and neither my workplace, nor the state, nor academia, nor “humanity” (which, somehow, never reduces to an actual group of specific individuals), nor evolution possess anything like what most informed people (myself hoping to be included, but alas) would call a property of Meaning-of-Life-Defining-ness. They are, I would say, hollow, in much the same way that you are proposing video-games to be hollow.
I propose that this kind of “hollowness” arises from an infinite recursion in the search for a Grand, Meaning-of-Life-y Supergoal. We’re not in the kind of universe that has one, so there’s no point using that standard in the first place.
What I cannot imagine at present is an argument against wireheading that reliably convinces proponents of wireheading.
When dealing with human emotions, demonstration is usually the only argument. You can’t argue someone into feeding their emotional faculties a proposed scenario in full perceptual detail. The move from conscious, logical faculties to emotional, feeling faculties is a voluntary choice of the person doing the imagining.
Of course, that didn’t stop Eliezer from trying with his Fun Theory Sequence, and a good try it was, too. It’s just not going to convince davkaniks—but nothing does.
You can’t imagine anything superior to wireheading? Sad.
The problem is that since we are not perfectly rational agents we have difficulties estimating the consequences of our actions, and our conscious preferences are probably not consistent with a Von Neuman-Morgenstern utility function.
I don’t want to be wireheaded, but I can’t be sure that if I was epistemically smarter or if my conscious preferences were somehow made more consistent, I would still stand by this decision. My intuition is that I would, but my intuition can be wrong, of course.
Wait. How are those two the same thing? You can criticize games for being escapist, but then you have to ask: escape from what, to what? What sort of “real life” (ie: all of real life aside from video games, since games are a strict subset of real life) are you intending to state is strictly superior in all cases to playing video games?
Video games are designed to be stimulate your brain to perform tasks such as spatial/logical problem solving, precise and fast eye-hand coordination, hunting and warfare against other agents, etc. The brain modules that perform these tasks evolved because they increased your chances of survival and reproduction, and the brain reward system also evolved in a way that makes it pleasurable to practice these tasks since even if the practice doesn’t directly increase your evolutionary fitness, it does it so indirectly by training these brain modules. In fact, all mammals play, especially as juveniles, and many also play as adults.
Video games, however, are superstimuli: If you play Call of Duty your eye-hand coordination becomes better, but unless you are a professional hunter or soldier, or something like that, it doesn’t increase your evolutionary fitness, and even if you are, there would be diminishing returns past a certain point, as the game can stimulate your brain modules much more than any “real world” scenario would. Nevertheless, it is pleasurable.
Many people, including myself would argue that we should not try to blindly maximize our evolutionary fitness. Yet, blindly following hedonistic preferences by indulging in superstimuli also seems questionable. Maybe there is a ideal middle ground, or maybe there is no consistent position. The point is, as Vaniver said, that these are difficult and important questions.
Many people, including myself would argue that we should not try to blindly maximize our evolutionary fitness. Yet, blindly following hedonistic preferences by indulging in superstimuli also seems questionable. Maybe there is a ideal middle ground, or maybe there is no consistent position.
It’s not a one-dimensional spectrum with evolutionary fitness on the one end and blind hedonism on the other end in the first place. Your evaluative psychology just doesn’t work that way. As to why you think there exists any such spectrum or trade-off, well, I blame bad philosophy classes and religious preachers: it’s to the clear advantage of moralizing-preacher-types to claim that normal evaluative judgement has no normative substance, and that everyone needs to Work For The Holy Supergoal instead, lest they turn into a drug addict in a ditch (paging high-school health class, as well...).
Richard, I know (real!) people who think that wireheading is the correct approach to life, who would do it to themselves if it were feasible, and who would vote for political candidates if they pledged to legalize or fund research into wireheading. (I realize this is different from forcible wireheading, but unless I’ve misjudged your position seriously I don’t think you see the lack of consent as the only serious issue with that proposal.)
I disagree with those people; I don’t want to wirehead myself. But I notice that I am uncertain about many issues:
Should they be allowed to wirehead? Relatedly, is it cruel of me to desire that they not wirehead themselves? Both of these issues are closely related to the issue of suicide—I do, at present, think it should be legal for others to kill themselves, and that it would be cruel of me to desire that they not kill themselves, rather than desiring that they not want to kill themselves.
Are there philosophical foundations that I could rely on to convince them that, while they might want to wirehead now, after reflecting on those foundations, they would not want to wirehead? Do I endorse those foundations for myself, or am I mistaken in not wanting to wirehead?
How should my feelings on wireheading impact my feelings on related issues, like video gaming? I definitely became much more ambivalent about my video gaming hobby when I saw the parallels to wireheading, and the generalized case of superstimuli deserves philosophical attention. But it could be instead that, by endorsing video gaming (at least for others), that means I should also endorse wireheading, as just the best possible video game.
It seems to me that, rather than acknowledging that these questions are serious and unanswered, and thus seeking to find answers, you instead seek to dismiss those questions as not worth asking, and thus not worth answering. I feel like this is the wrong approach to philosophy, for many reasons.
I also suspect that this attitude generalizes: there are many places where I see Yudkowsky and similar thinkers saying “hold on a minute, what feature of reality can we point to that generates X?” and you seem to be responding with “but clearly X will be generated,” or “but superintelligence will generate X.” Is it actually clear? Will it be obvious in prospect that an AI design is superintelligent at determining what we want, rather than just obvious in retrospect? What features of the AI design should we be looking for?
For example, you point out that Swarm Relaxation Intelligences (SRIs) lack some dangerous features of CLAIs. This seems like a move in the right direction, but a proof of safety is not just the absence of known errors. It seems wise to let safety and verifiability drive our design choices, and to develop a strong understanding of what features lead to what guarantees (or, at least, what evidence in favor of safety or alignment).
You make a valid point—and one worth discussing at length, sometime—but the most important thing right now is that you have misunderstood my position on the question.
First of all, there is a very big distinction between a few people (or even the whole population!) making a deliberate choice to wirehead, and the nanny AI deciding to force everyone to wirehead because that is its interpretation of “making humans happy” (and doing so in a context in which those humans do not want to do it).
You’ll notice that in the above quote from my essay, I said that most people would consider it a sign of insanity if a human being were to suggest forcing ALL humans to wirehead, and doing so on the grounds that this was the best way to achieve universal human happiness. If that same human were to suggest that we should ALLOW some humans to wirehead if they believed it would make them happy, then I would not for a moment label that person insane, and quite a lot of people would react the same way.
So I want to be very clear: I very much do acknowledge that the questions regarding various forms of voluntary wireheading are serious and unanswered. I’m in complete agreement with you on that score. But in my paper I was talking only about the apparent contradiction between (a) forcing people to do something as they screamed their protests, while claiming that those people had asked for this to be done, and (b) an assessment that this behavior was both intelligent and sane. My claim in the above quote was that there is a prima facie case to be made, that the proper conclusion is that the behavior would indeed not be intelligent and sane.
(Bear in mind that the quote was just a statement of a prima facie. The point was not to declare that the AI really is insane and/or not intelligent, but to say that there are grounds for questioning. Then the paper goes on to look into the whole problem in more detail. And, most important of all, I am trying to suggest that responding to this my simply declaring that the AI has a ‘different’ type of intelligence, or even a superior type of ‘intelligence’ would be a glaring example of sweeping the prima facie case straight into the trashcan without even looking at it.)
You go on to make a different point:
Well, I hope I have reassured you that, even if I do do that, it would not be a generalization of my attitude.
But now, do I do that? I try really hard not to take anything for granted and simply make an appeal to the obviousness of any idea. So you will have to give me some case-by-case examples if you think I really have done that.
You mention only one, and it is a biggie.
The whole issue of proving safety is deeply problematic (a whole essay in itself). I tried to talk about it in the above, but there was not enough space to develop it fully.
The basic points are these:
1) Proof-of-correctness techniques can indeed help with some things, but as Selmer Bringsjord found out to his embarrassment at the 2009 AGI conference, there is a problem if anyone thinks that proofs can be had in high-complexity situations. As I explained at that time, there are two things that go into a proof-of-correctness machine: (a) the target whose correctness is to be proved, and (b) a specification of what qualifies as correctness. In those cases where the specification of what qualifies is simply a list of syntactic rules that must be followed, this approach is valid—but when the specification of what qualifies as “correct” becomes huge (for example, it involves a massive, open-ended stipulation of the full meaning of “moral behavior” or “friendliness toward humanity”), the specification ITSELF will have bugs in it, and the specification will be of the same magnitude as the target! This means that the specification needs to be checked for correctness first … and you can see that this quickly leads to an infinite regress.
What this means is that when you point to the security engended by type-checking in functional programming, you seem to be implying that we really need something like that for checking or proving the safety of AGI systems, and really that is never going to be possible. Yes, certain classes of errors can be guaranteed not to occur in some types of programming, but that will never generalize to systems in which the specification of what counts as correct is large. That is a dream that we need to let go of.
(I started writing a paper about that a few weeks ago, in fact, but I stopped doing it because someone pointed out that MIRI no longer even claims that a rigorous proof of Friendliness is even possible. My understanding is, then, that this point I just made is now generally accepted.)
2) However, statistically-oriented ‘proofs’ can be useful in systems where the overall behavior is caused by a large ensemble of interacting processes, and where the overall behavior cannot be hijacked by any one of the atoms in that ensemble. To wit: we can make dramatically precise statements about the ensemble properties of thermodynamic systems, even though the individual atoms can do whatever they like.
It is that second type of proof-of-friendliness (it is not really a ‘proof’ it is only a statistical argument) is what I was alluding to when I talked about Swarm Relaxation systems.
So, I don’t think I was just assuming that Swarm Relaxation could lead to stronger claims about friendliness, I was basing the claim on a line of reasoning.
So, on rereading the paper I was able to pinpoint the first bit of text that made me think this (the quoted text and the bit before), but am having difficulties finding a second independent example, and so I apologize for the unfairness in generalizing based on one example.
The other examples I found looked like they all relied on the same argument. Consider the following section:
If I think the “logical consistency” argument does not go through, I shouldn’t claim this is an independent argument that doesn’t go through, because this argument holds given the premises (at least one of which I reject, but it’s the same premise). I clearly had this line in mind also:
The ‘principal-agent problem’ is a fundamental problem in human institutional design: principals would like to be able to hire agents to perform tasks, but only have crude control over the incentives of the agents, and the agents often have control over what information makes it to the principals. One way to characterize the AI value alignment problem (as I hear MIRI is calling it these days) is that it’s a principal agent problem where the agent has massive control over the information the principal sees, but the value difference between principals and agents is only due to communication problems, rather than any malice on the part of the agent. That is, the principal wants the agent to do “what I mean,” but the agent only has access to “what I say,” and cannot be assumed to have any mind-reading powers that we don’t build into it.
It seems very difficult to get an AI to correctly classify the difference between programming error and programming intention, and even more difficult for the AI to communicate to us that it has correctly classified that issue. (We have both the illusion of transparency, and the double illusion of transparency to deal with!) Claiming that something is “clearly” a programming error strikes me as trivializing the underlying communication problem. But I agree with you that if we have that problem solved, then we’re home free.
I just wanted to say that I will try to reply soon. Unfortunately :-) some of the comments have been intensely thoughtful, causing me to write enormous replies of my own and saturating my bandwidth. So, apologies for any delay....
Some thoughts in response to the above two comments.
First, don’t forget that I was trying to debunk a very particular idea, rather than other cases. My target was the idea that a future superintelligent AGI could be programmed to have the very best of intentions, and it might claim to be exercising the most extreme diligence in pursuit of human happiness, while at the same time it might think up a scheme that causes most of humanity to scream with horror while it forces the scheme on those humans. That general idea has been promoted countless times (and has been used to persuade people like Elon Musk and Stephen Hawking to declare that AI could be cause the doom of the human race), and it is has also been cited as an almost inevitable end point of the process of AGI development, rather than just a very-low-risk possibility with massive consequences.
So, with that in mind, I can say that there are many points of agreement between us on the subject of all those cases that you brought up, above, where there are ethical dilemmas of a lesser sort. There is a lot of scope for us having a detailed discussion about all of those dilemmas—and I would love to get into the meat of that discussion, at some point—but that wasn’t really what I was trying to tackle in the paper itself.
(One thing I could say about all those cases is that if the AGI were to “only” have the same dilemmas that we have, when trying to figure out the various ethical condundrums of that sort, then we are no worse off than we are now. Some people use the inability of the AGI to come up with optimal solutions in (e.g.) Trolley problems as a way to conclude that said AGIs would be unethical and dangerous. I strongly disagree with those who take that stance).
Here is a more important comment on that, though. Everything really comes down to whether the AGI is going to be subject to bizarre/unexpected failures. In other words, it goes along perfectly well for a long time, apparently staying consistent with what we’d expect of an ethically robust robot, and then one day it suddenly does something totally drastic that turns out to have been caused by a peculiar “edge case” that we never considered when we programmed it. (I am reminded of IBM’s Watson answering so many questions correctly on Jeopardy, and then suddenly answering the question “What do grasshoppers eat?” with the utterly stupid reply “Kosher.”).
That issue has been at the core of the discussion I have been having with Jessicat, above. I won’t try to repeat all of what I said there, but my basic position is that, yes, that is the core question, and what i have tried to do is to explain that there is a feasible way to address precisely that issue. That was what all the “Swarm Relaxation” stuff was about.
Finally, to reply to your statement that you think the “logical consistency” idea does not go through, can I ask that you look at my reply to “misterbailey”, elsewhere in the comments? He asked a question, so I tried to clarify exactly where the logical inconsistency was located. Apparently he had misunderstood what the inconsistency was supposed to be. It might be that the way I phrased it there could shed some light on your disagreement with it. Let me know if it does.
I suspect this may be because of different traditions. I have a lot of experience in numerical optimization, and one of my favorite optimization stories is Dantzig’s attempt to design an optimal weight-loss diet, recounted here. The gap between a mathematical formulation of a problem, and the actual problem in reality, is one that I come across regularly, and I’ve spent many hours building bridges over those gaps.
As a result, I find it easy to imagine that I’ve expressed a complicated problem in a way that I hope is complete, but the optimization procedure returns a solution that is insane for reality but perfect for the problem as I expressed it. As the role of computers moves from coming up with plans that humans have time to verify (like delivering a recipe to Anne, who can laugh off the request for 500 gallons of vinegar) to executing actions that humans do not have time to verify (like various emergency features of cars, especially the self-driving variety, or high-frequency trading), this possibility becomes more and more worrying. (Even when humans do verify the system, the more trustworthy the system, the more the human operator will trust it—and thus, the more likely that the human will fail to catch a system error.)
Similarly, one might think of another technological field whose history is more mature, yet still recent: aircraft design. The statement “that airplanes will crash is almost inevitable” seems more wise than not to me—out of all of the possible designs that you or I would pattern-match to an “airplane,” almost all of them crash. Unfortunately, designs that their designer is sure will work still crash sometimes. Of course, some airplanes work, and we’ve found those designs, and a hundred years into commercial air travel the statement that crashes are almost inevitable seems perhaps silly.
So just like we might acknowledge that it’s difficult to get a plane that flies without crashing, and it’s also difficult to be sure that a design will fly without crashing without testing it, it seems reasonable to claim that it will also be difficult for AGI design to operate without unrecoverable mistakes—but even more so.
I agree with this, and further, I agree that concepts encoded by many weak constraints will be more robust than concepts encoded by few hard constraints, by intuitions gained from ensemble learners.
I might elaborate the issue further, by pointing out that there is both the engineering issue, of whether or not it fails gracefully in all edge cases, and the communication issue, of whether or not we are convinced that it will fail gracefully. Both false positives and false negatives are horrible.
I don’t think I agree with point 1 that you raise here:
I think that any active system has to implicitly follow a doctrine that I’ll state as “I did the best I could have, knowing what I did then.” That is, I restate your (2) as the system knowing that, living in an uncertain universe and not having logical omniscience, it will eventually make mistakes. Perhaps in response it will shut itself down, and become an inactive system (this is the inconsistency that I think you’re pointing at). Or perhaps it will run the numbers and say “it’s better to try something than do nothing, even after taking the risk of mistakes into account.”
Now, of course, this isn’t an imperative to always act immediately without consideration. Oftentimes, the thing to try is “wait and think of a better plan” or “ask others if this is a good idea or not,” but the problem of discernment shows up again. What logic is the system using to determine when to stop thinking about new plans? What logic is it using to determine whether or not to ask for advice? If it could predict what mistakes it would make ahead of time, it wouldn’t make them!
To go back to my analogy of aircraft design: suppose we talked about the “quality” of aircraft, which was some overall gestalt of how well they flew, and we eagerly looked forward to days when aircraft designs become better.
At one point, the worry is raised that future aircraft designs might be too high quality. On the face of it, this sounds ridiculous: how could it be that increasing the quality of the aircraft makes it less safe or desirable? Further elaboration reveals that there are two parts of aircraft design: engines and guidance systems. If engines grow much more powerful, but guidance systems remain the same, the aircraft might become much less safe—a tremble at the controls, and now the aircraft is spinning madly.
Relatedly, I found your “Because intelligence” section unsatisfying, because it seems like it’s resisting that sort of separation—separating ‘intelligence’ into, say, ‘cleverness’ (the ability to find a plan that achieves some consequence) and ‘wisdom’ (the ability to determine the consequences of a plan, and the desirability of those consequences) seems helpful when talking about designing intelligent agents.
I think Eliezer and others point out that systems that are very clever but not very wise are very dangerous and that cleverness and wisdom are potentially generated by different components. It seems to me that your models of intelligence have a deeper connection between cleverness and wisdom, and so you think it’s considerably less likely that we’ll get that sort of dangerous system that is clever but not wise.
Thanks for the clarification!
I’m glad to hear that we’re focusing on this narrow issue, so let me try to present my thoughts on it more clearly. Unfortunately, this involves bringing up many individual examples of issues, none of which I particularly care about; I’m trying to point at the central issue that we may need to instruct an AI how to solve these sorts of problems in general, or we may run into issues where an AI extrapolates its models incorrectly.
When people talk about interpersonal ethics, they typically think in terms of relationships. Two people who meet in the street have certain rules for interaction, and teachers and students have other rules for interaction, and doctors and patients other rules, and so on. When considering superintelligences interacting with intelligences, the sort of rules we will need seem categorically different, and the closest analogs we have now are system designers and engineers interacting with users.
When we consider people interacting with people, we can rely on ‘informed consent’ as our gold standard, because it’s flexible and prevents most bad things while allowing most good things. But consent has its limits; society extends children only limited powers of consent, reserving many (but not all) of them for their parents; some people are determined mentally incapable, and so on. We have complicated relationships where one person acts in trust for another person (I might be unable to understand a legal document, but still sign it on the advice of my lawyer, who presumably can understand that document, or be unable to understand the implications of undergoing a particular medical treatment, but still do it on the advice of my doctor), because the point of those relationships is that one person can trade their specialized knowledge to another person, but the second person is benefited by a guarantee the first person is actually acting in their interest.
We can imagine a doctor wireheading their patient when the patient did not in fact want to be wireheaded, for a wide variety of reasons. I’m neither a doctor nor a lawyer, so I can’t tell you what sort of consent a doctor needs to inject someone with morphine—but it seems to me that sometimes things will be uncertain, and the doctor will drug someone when a doctor with perfect knowledge would not have, but we nevertheless endorse that decision as the best one the doctor could have made at the time.
But informed consent starts being less useful when we get to systems. Consider a system that takes biomedical data from every person in America seeking an organ transplant, and the biomedical data from donated organs as they become available, and matches organs to patients. Everyone involved likely consented to be involved (or, at least, didn’t not consent if there’s an opt-out organ donation system), but there are still huge ethical questions remaining to be solved. What tradeoffs are we willing to accept between maximizing QALYs and ensuring equity? How much biomedical data can we use to predict the QALYs from any particular transplant, and what constitutes unethical discrimination?
It seems unlikely to me that the main challenge of superintelligence is that a superintelligence will force or trick us into doing things that it is obvious that we don’t want to do them. It seems likely to me that the main challenge is that it will set up systems, potentially with some form of mandatory participation, and we thus need to create a generalized system architect that can solve those ethical, moral, and engineering problems for us while designing arbitrary systems, without us having coded in the exact solutions.
Notice also that consent applies to individual rights, not community rights, but many people’s happiness and livelihoods may rest on community rights. There are already debates over whether or not deafness should be cured: how sad to always be the youngest deaf person, or for deaf culture to disappear, but to avoid that harm, we need some people to be deaf instead of hearing, which is its own harm. Managing this in a way that truly maximizes human flourishing seems like it requires a long description.
Many human ethical problems are solved by rounding small numbers to zero, but superintelligences represent the ability to actually track those small numbers, which means entire legal categories that rest on a sharp divide between 0 and 1 could become smooth. For example, consider ‘sexual harassment’ defined as ‘unwanted advances.’ Should a SI censor any advances it thinks that the receiver will not want, or is that taking sovereignty from the receiver to determine whether or not they want any advance?
Right, and I agree with it as well. I think the remaining useful insight of functional programming is that minimizing side effects increases code legibility, and if we want to be confident in the reasoning of an AI system (or an AI system wants to be able to confidently predict the impact of a proposed self-modification) we likely want the code to be partitioned as neatly as possible, so any downstream changes or upstream dependencies can be determined simply.
Neural nets, and related systems, do not have a preference for legibility built into the underlying structure, and so we may not want to use them or related systems for goal management code, or take especial care when connecting them to goal management code.
Hmm. I’m going to have to think about this claim longer, but right now I disagree. I think the model of human reasoning that seems most natural to me is hierarchical control systems. When I think of “swarms,” that implies to me some sort of homogeneity between the agents, such as might describe groups of humans but not necessarily individual humans. (If we consider humans as swarms of neurons, which is how I originally read your statement, then the ‘swarm-like’ properties map on the control loops of the hierarchical view.)
But it seems to me that if the atoms in the swarm have specialized roles (like neurons), then a small number of atoms behaving strangely can lead to the swarm behaving strangely. (This is easier to see in the controls case, but I think is also sensible in the swarm of neurons model.) I’m thinking of the various extreme cases of psychology as examples: stuff like destroying parts of cats’ brains and seeing how they behave, or the various exotic psychological disorders whose causes can be localized in the brain, or so on. That a system is built out of many subcomponents, instead of being a single logical reasoner, does not seem to me to be a significant source of safety.
(Now, I do think that various ‘moral congress’ ideas might represent some sort of safety—if you need many value systems to all agree that something is a good idea, then it seems less likely that extreme alternatives will be chosen in exotic scenarios, and a single value system can be a composite of many simpler value systems. This is ideas like ‘bagging’ from machine learning applied to goals—but the gains seem minor to me.)
No. Dependently-typed theorem proving is the only thing safe enough ;-). That, or the kind of probabilistic defense-in-depth that comes from specifying uncertainty about the goal system and other aspects of the agent’s functioning, thus ensuring that updating on data will make the agent converge to the right thing.
I agree with everything in this comment up to:
This doesn’t appear to be correct given that you can always transform functional programs into imperative programs and vice versa.
I’ve never heard that you can program in functional languages without doing testing and relying only on type checking to ensure correct behavior.
In fact, AFAIK, Haskell, the most popular pure functional programming language, is bad enough that you actually have to test all non-trivial programs for memory leaks, since it is not possible to reason except for special cases about the memory allocation behavior of a program from its source code and the language specification: the allocation behavior depends on implementation-specific and largely undocumented details of the compiler and the runtime.
Anyway, this memory allocation issue may be specific to Haskell, but in general, as I understand, there is nothing in the functional paradigm that guarantees a higher level of correctness than the imperative paradigm.
“Certain classes of errors” is meant to be read as a very narrow claim, and I’m not sure that it’s relevant to AI design / moral issues. Many sorts of philosophical errors seem to be type errors, but it’s not obvious that typechecking is the only solution to that. I was primarily drawing on this bit from Programming in Scala, and in rereading it I realize that they’re actually talking about static type systems, which is an entirely separate thing. Editing.
Ok, sorry for being nitpicky.
In case it wasn’t clear, thanks for nitpicking, because I was confused and am not confused about that anymore.
The relevant difference is in isolation and formulation of side effects, which encourages formulation of more pieces of code whose behavior can be understood precisely in most situations. The toolset of functional programming is usually better for writing higher order code that keeps the sources of side effects abstract, so that they are put back separately, without affecting the rest of the code. As a result, a lot of code can have well-defined behavior that’s not disrupted by context in which it’s used.
This works even without types, but with types the discipline can be more systematically followed, sometimes enforced. It also becomes possible to offload some of the formulation-checking work to a compiler (even when the behavior of a piece of code is well-defined and possible to understand precisely, there is the additional step of making sure it’s used appropriately).
See Why Haskell just works. It’s obviously not magic, the point is that enough errors can be ruled out by exploiting types and relying on spare use of side effects to make a difference in practice. This doesn’t ensure correct behavior (for example, Haskell programs can always enter an infinite loop, while promising to eventually produce a value of any type, and Standard ML programs can use side effects that won’t be reflected in types). It’s just a step in the right direction, when correctness is a priority. There is also a prospect that more steps in this direction might eventually get you closer to correctness.
tangent:
I used to love functional programming and the elegance of e.g. Haskell, until I realized functional programming has the philosophy exactly backwards. You want to make it easy for humans and hard for machines, not vice versa.
Human think causally, e.g. imperatively and statefully. When humans debug functional/lazy programs, they generally smuggle in stateful/causal thinking to make progress. This is a sign something is going wrong with the philosophy.
Hm. The primary reason I got interested in fp is that I really like SQL, I think it is very easy for the human mind. And LINQ is built on top of functional programming, the Gigamonkeys book buils a similar query language on top of functional programming and macros, so it seems perhaps fp should be used that way, taking it as far as possible towards making query languages in it.
But I guess it always depends on what you want to do. My philosophy of programming is automation based. That means, if I need to do something once, I do it by hand, if a thousand times I write code. This, the ability to repeat operations many times, is what makes automating human work possible and from this I derived that the most important imperative structure is the loop. The loop is what turns something that was a mere set of rules into a powerfol data processing machinery, doing an operation many more times than I care to do it. With SQL, LINQ and other queries, we are essentially optimizing the loop as such. For example the generator expression in Python is a neat little functional loop-replacement, mini-query language.
Yes, that’s how it was intended to be and how they spin it, but in practice the abstraction is leaky and it leaks in bad, difficult to predict ways therefore, as I said, you end up with things like having to test for memory leaks, something that is usually not an issue in “imperative” languages like Java, C# or Python.
I like the functional paradigm inside a good multi-paradigm language: passing around closures as first-class objects is much cleaner and concise than fiddling with subclasses and virtual methods, but forcing immutability and lazy evaluation as the main principles of the language doesn’t seem to be a good design choice. It forces you to jump through hoops to implement common functionality like interaction, logging or configuration, and in return it doesn’t deliver the higher modularity and intelligibility that were promised.
Anyway, we are going OT.
Agreed. Abstractions are still leaky, and where some pathologies in abstraction (i.e. human-understandable precise formulation) can be made much less of an issue by using the functional tools and types, others tend to surface that are only rarely a problem for more concrete code. In practice, the tradeoff is not one-sided, so its structure is useful for making decisions in particular cases.
Shouldn’t we just offer them a superior alternative?
You can’t imagine anything superior to wireheading? Sad. (Edit: What!? Come on, downvoters: the entire Fun Theory Sequence was written on the idea that there are strictly better things to do with life than nail your happy-dial at maximum. Disagree if you like, but it’s not exactly an unsupported opinion.)
Wait. How are those two the same thing? You can criticize games for being escapist, but then you have to ask: escape from what, to what? What sort of “real life” (ie: all of real life aside from video games, since games are a strict subset of real life) are you intending to state is strictly superior in all cases to playing video games?
What I cannot imagine at present is an argument against wireheading that reliably convinces proponents of wireheading. As it turns out, stating their position and then tacking “Sad” to the end of it does not seem to reliably do so.
Obviously they are not the same thing. From the value perspective, one of them looks like an extreme extension of the other; games are artificially easy relative to the rest of life, with comparatively hollow rewards, and can be ‘addictive’ because they represent a significantly tighter feedback loop than the rest of life. Wireheading is even easier, even hollower, and even tighter. So if I recoil from the hollowness of wireheading, can I identify a threshold where that hollowness becomes bad, or should it be a linear penalty, that I cannot ignore as too small to care about when it comes to video gaming? (Clearly, penalizing gaming does not mean I cannot game at all, but it likely means that I game less on the margin.)
What you need here is to unpack your definition of “hollow”.
Let’s go a little further along the spectrum from culturally mainstream definitions of “hollow” to culturally mainstream definitions of “meaningful”.
My hobby is learning Haskell. In fact, just a couple of minutes ago I solved a challenge on HackerRank—writing a convex-hull algorithm in Haskell. This challenged me, and was fun for a fair bit. However, Haskell isn’t my job, and I don’t particularly want a job writing Haskell, nor do I particularly care—upon doing the degree of conscious reflection involved in asking, “Should I spend effort going up a rank on HackerRank, or taking a walk outside in the healthy fresh air?”—about the gamified rewards on HackerRank. From the “objective” point of view, in which my actions are “meaningful” and “non-hollow” when they serve the supergoals of some agent containing me, or some optimization process larger than me (ie: when they serve God, the state, my workplace, academia, humanity, whatever), learning Haskell is almost, but not quite, entirely pointless.
And yet I bet you would still consider it more meaningful and less pointless than a video game, or eating dessert, or anything else done purely for fun.
So again: let’s unpack. I am entirely content to pursue reflectively-coherent fun that is tied up with the rest of reality around me. I can trade off and do Haskell instead of gaming because Haskell is more tied up with the rest of reality around me than gaming. I could also trade off the other way around, as I might if I, for instance, joined a weekly D&D play group. But what I am personally choosing to pursue is reflectively-coherent fun that’s tied up with the rest of reality, not Usefulness to the Greater Glory of Whatever.
Problem is, Usefulness to the Greater Glory of Whatever is, on full information and reflection, itself entirely empty. There is no Greater Whatever. There’s no God, and neither my workplace, nor the state, nor academia, nor “humanity” (which, somehow, never reduces to an actual group of specific individuals), nor evolution possess anything like what most informed people (myself hoping to be included, but alas) would call a property of Meaning-of-Life-Defining-ness. They are, I would say, hollow, in much the same way that you are proposing video-games to be hollow.
I propose that this kind of “hollowness” arises from an infinite recursion in the search for a Grand, Meaning-of-Life-y Supergoal. We’re not in the kind of universe that has one, so there’s no point using that standard in the first place.
When dealing with human emotions, demonstration is usually the only argument. You can’t argue someone into feeding their emotional faculties a proposed scenario in full perceptual detail. The move from conscious, logical faculties to emotional, feeling faculties is a voluntary choice of the person doing the imagining.
Of course, that didn’t stop Eliezer from trying with his Fun Theory Sequence, and a good try it was, too. It’s just not going to convince davkaniks—but nothing does.
The problem is that since we are not perfectly rational agents we have difficulties estimating the consequences of our actions, and our conscious preferences are probably not consistent with a Von Neuman-Morgenstern utility function.
I don’t want to be wireheaded, but I can’t be sure that if I was epistemically smarter or if my conscious preferences were somehow made more consistent, I would still stand by this decision. My intuition is that I would, but my intuition can be wrong, of course.
Video games are designed to be stimulate your brain to perform tasks such as spatial/logical problem solving, precise and fast eye-hand coordination, hunting and warfare against other agents, etc.
The brain modules that perform these tasks evolved because they increased your chances of survival and reproduction, and the brain reward system also evolved in a way that makes it pleasurable to practice these tasks since even if the practice doesn’t directly increase your evolutionary fitness, it does it so indirectly by training these brain modules. In fact, all mammals play, especially as juveniles, and many also play as adults.
Video games, however, are superstimuli: If you play Call of Duty your eye-hand coordination becomes better, but unless you are a professional hunter or soldier, or something like that, it doesn’t increase your evolutionary fitness, and even if you are, there would be diminishing returns past a certain point, as the game can stimulate your brain modules much more than any “real world” scenario would.
Nevertheless, it is pleasurable.
Many people, including myself would argue that we should not try to blindly maximize our evolutionary fitness. Yet, blindly following hedonistic preferences by indulging in superstimuli also seems questionable. Maybe there is a ideal middle ground, or maybe there is no consistent position. The point is, as Vaniver said, that these are difficult and important questions.
It’s not a one-dimensional spectrum with evolutionary fitness on the one end and blind hedonism on the other end in the first place. Your evaluative psychology just doesn’t work that way. As to why you think there exists any such spectrum or trade-off, well, I blame bad philosophy classes and religious preachers: it’s to the clear advantage of moralizing-preacher-types to claim that normal evaluative judgement has no normative substance, and that everyone needs to Work For The Holy Supergoal instead, lest they turn into a drug addict in a ditch (paging high-school health class, as well...).