I think Eliezer’s meta-ethics is wrong because it’s possible that we live in a world where Eliezer’s “right” doesn’t actually designate anything. That is, where a typical human’s morality, when extrapolated, fails to be coherent. “Right” should still mean something in a world like that, but it doesn’t under Eliezer’s theory.
Also, to jump the gun a bit, your own meta-ethics, desirism, says:
Thus, morality is the practice of shaping malleable desires: promoting desires that tend to fulfill other desires, and discouraging desires that tend to thwart other desires.
What does this mean in the FAI context? To a super-intelligent AI, it’s own desires, as well as those of everyone else on Earth, can be considered “malleable”, in the sense that it can change all of them if it wanted to. But there might be some other super-intelligent AIs (created by aliens) whose desires it is powerless to change. I hope desirism doesn’t imply that it should change my desires so as to fulfill the alien AIs’ desires...
I haven’t found a satisfactory meta-ethics yet, so I still don’t know. But whatever the answer is, it has to be at least as good as “my current (unextrapolated) preferences”. “Nothing” is worse than that, so it can’t be the correct answer.
This is actually a useful way of looking at what metaethics (decision theory) is: tools for self-improvement, explaining specific ways in which correctness of actions (or correctness of other tools of the same kind) can be judged. In this sense, useless metaethics is one that doesn’t help you with determining what should be done, and wrong metaethics is one that’s actively stupid, suggesting you to do things that you clearly shouldn’t (for FAI based on that metaethics, correspondingly doing things that it shouldn’t).
In this sense, the injunction of doing nothing in response to failed assumptions (i.e. no coherence actually present) in CEV is not stupid, since your own non-extrapolated mind is all you’ll end up with in case CEV shuts down. It is a contingency plan for the case it turns out to be useless.
(I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent’s judgment, even “preference” or “logical correctness”. This also explains a bit of our talking past each other in the other thread.)
(I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent’s judgment, even “preference” or “logical correctness”. This also explains a bit of our talking past each other in the other thread.)
I don’t have much idea what you mean here. This seems important enough to write up as more than a parenthetical remark.
I spent a lot of time laboring under the intuition that there’s some “preference” thingie that summarizes all we care about, that we can “extract” from (define using a reference to) people and have an AI optimize it. In the lingo of meta-ethics, that would be “right” or “morality”, and it distanced itself from the overly specific “utility” that also has the disadvantage of forgetting that prior is essential.
Then, over the last few months, as I was capitalizing on finally understanding UDT in May 2010 (despite having convinced a lot of people that I understood it long before that, I completely failed to get the essential aspect of controlling the referents of fixed definitions, and only recognized in retrospect that what I figured out by that time was actually UDT), I noticed that a decision problem requires many more essential parts than just preference, and so to specify what people care about, we need a whole human decision problem. But the intuition that linked to preference in particular, which was by then merely a part of the decision problem, still lingered, and so I failed to notice that now not preference, but the whole decision problem, is analogous to “right” and “morality” (but not quite, since that decision problem still won’t be the definition of right, it can be judged in turn), and the whole agent that implements such decision problem is the best tool available to judge them.
This agent, in particular, can find itself judging its own preference, or its own inference system, or its whole architecture that might or might not specify an explicit inference system as its part, and so on. Whatever explicit consideration it’s moved by, that is whatever module in the agent (decision problem) it considers, there’s a decision problem of self-improvement where the agent replaces that module with something else, and things other than that module can have a hand in deciding.
Also, there’s little point in distinguishing “decision problem” and “agent”, even though there is a point in distinguishing a decision problem and what’s right. Decision problem is merely a set of tricks that the agent is willing to use, as is agent’s own implementation. What that set of tricks wants to do is not specified in any of the tricks, and the tricks can well fail the agent.
When we apply these considerations to humans, it follows that no human can know what they care about, they can only guess (and, indeed, design) heuristic rules for figuring out what they care about, and the same applies to when they construct FAIs. So extracting “preference” exactly is not possible, instead FAI should be seen as a heuristic, that would still be subject to moral judgment and probably won’t capture it whole, just as humans themselves don’t implement what’s right reliably. Recognizing that FAI won’t be perfect, and that things it does are merely ways of more reliably doing the right thing, looks like an important intuition.
(This is apparently very sketchy and I don’t expect it to get significantly better for at least a few months. I could talk more (thus describing more of the intuition), but not clearer, because I don’t understand this well myself. An alternative would have me write up some unfinished work that would clarify each particular intuition, but would be likely of no lasting value, and so should wait for a better rendition instead.)
it follows that no human can know what they care about
This sounds weird, like you’ve driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:
a) Can a paperclipper know what it cares about?
b) How is a human fundamentally different from a paperclipper with respect to (a)?
If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty.
Hence “explicit considerations”, that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.
Can a paperclipper know what it cares about?
No, at least while it’s still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn’t disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won’t look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.
ETA:
Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
I defy the possibility that we may “not care about logic” in the sense that you suggest.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality?
(Not “morality” here, of course, but its counterpart in the analogy.)
What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by “actually proving it”? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there’s a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what’s physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you’ll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That’s fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it’s best to choose what’s right.
And I guess I agree that if being right has high utility then it’s best to choose what’s right.
Seeking high utility is right (and following rules of logic is right), not the other way around. “Right” is the unreachable standard by which things should be, which “utility” is merely a heuristic for representation of.
I’m generally sympathetic towards these intuitions, but I have a few reservations:
Isn’t it possible that it only looks like “heuristics all the way down” because we haven’t dug deep enough yet? Perhaps in the not too distant future, someone will come up with some insights that will make everything clear, and we can just implement that.
What is the nature of morality according to your approach? You say that a human can’t know what they care about (which I assume you use interchangeably with “right”, correct me if I’m wrong here). Is it because they can’t, in principle, fully unfold the logical definition of right, or is it that they can’t even define “right” in any precise way?
This part assumes that your answer to the last question is “the latter”. Usually when someone says “heuristic” they have a fully precise theory or problem statement that the heuristic is supposed to be an approximate solution to. How is an agent supposed to design a set of heuristics without a such a precise definition to guide it? Also, if the agent itself uses the words “morality” or “right”, what do they refer to?
If the answer to the question in 2 is “the former”, do you have any idea what the precise definition of “right” looks like?
Isn’t it possible that it only looks like “heuristics all the way down” because we haven’t dug deep enough yet?
Everything’s possible, but doesn’t seem plausible at this point, and certainly not at human level. To conclude that something is not a heuristic, but the thing itself, one would need too much certainty to be expected of such a question.
What is the nature of morality according to your approach? You say that a human can’t know what they care about (which I assume you use interchangeably with “right”, correct me if I’m wrong here).
I did use that interchangeably.
Is it because they can’t, in principle, fully unfold the logical definition of right, or is it that they can’t even define “right” in any precise way?
Both (the latter). Having an explicit definition would correspond to “preference” which I discussed in the grandparent comment. But when we talk of merely “precise”, at least in principle we could hope to obtain a significantly more precise description, maybe even on human level, which is what meta-ethics should strive to give us. Every useful heuristic is an element of such a description, and some of the heuristics, such as laws of physics, are very precise.
How is an agent supposed to design a set of heuristics without such a precise definition to guide it?
The current heuristics, its current implementation, which is understood to be fallible.
Also, if the agent itself uses the words “morality” or “right”, what do they refer to?
Don’t know (knowing would give a definition). To the extent it’s known, see the current heuristics (long list), maybe brains.
Essentially, what you’re describing is just the situation that we are actually faced with. I mean, when I use the word “right” I think I mean something but I don’t know what. And I have to use my current heuristics, my current implementation without having a precise theory to guide me.
And you’re saying that this situation is unlikely to change significantly by the time we build an FAI, so the best we can expect to do is equivalent to a group of uploads improving themselves to the best of their abilities.
I tend to agree with this (although I think I assign a higher probability that someone does make a breakthrough than you perhaps do), but it doesn’t really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
Essentially, what you’re describing is just the situation that we are actually faced with.
I’m glad it all adds up to normality, given the amount of ink I spilled getting to this point.
And you’re saying that you don’t expect this situation to change significantly by the time we build an FAI, so the best we can do is equivalent to a group of uploads improving themselves to the best of their abilities.
Not necessarily. The uploads construct could in principle be made abstract, with efficient algorithms figuring out the result of the process much quickly than if it’s actually simulated. More specific heuristics could be figured out that make use of computational resources to make better progress, maybe on early stages by the uploads construct.
it doesn’t really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
I’m not sure about that. If it’s indeed all we can say about morality right now, then that’s what we have to say, even if it doesn’t belong to the expected literary genre. It’s too easy to invent fake explanations, and absence of conclusions invites that, where a negative conclusion could focus the effort elsewhere.
(Also, I don’t remember particular points on which my current view disagrees with Eliezer’s sequence, although I’d need to re-read it to have a better idea, which I really should, since I only read it as it was posted, when my understanding of the area was zilch.)
I second this request. In particular, please clarify whether “preference” and “logical correctness” are presented here as examples of “explicit considerations”. And whether whole agent should be parsed as including all the sub-agents? Or perhaps as extrapolated agent?
Perhaps he’s refering to the part of CEV that says “extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”. Even logical coherence becomes in this way a focus of extrapolation dynamics, and if this criterion should be changed to something else—as judged by the whole of our extrapolated morality in a strange-loopy way—well, so be it. The dynamics should reflect on itself and consider the foundational assumptions it was built upon, including the compelingness of basic logic we are currently so certain about—and of course, if it really should reflect on itself in this way.
Anyway, I’d really like to hear what Vladimir has to say about this. Even though it’s often quite hard for me to parse his writings, he does seem to clear things up for me or at least direct my attention towards some new, unexplored areas...
...and continuing from the other comment, the problem here is that one meta-ethical conclusion seems to be that no meta-ethics can actually define what “right” is. So any meta-ethics would only pour a limited amount of light on the question, and is expected to have failure modes, where the structure of the theory is not quite right. It’s a virtue of a meta-ethical theory to point out explicitly some of its assumptions, which, if not right, would make the advice it gives incorrect. In this case, we have an assumption of reflective coherence in human value, and a meta-ethics that said that if it’s not so, then it doesn’t know anything. I’m pretty sure that Eliezer would disagree with the assertion that if any given meta-ethics, including some version of his own, would state that the notion of “right” is empty, then “right” is indeed empty (see “the moral void”).
I haven’t found a satisfactory meta-ethics yet, so I still don’t know. But whatever the answer is, it has to be at least as good as “my current (unextrapolated) preferences”. “Nothing” is worse than that, so it can’t be the correct answer.
Better—for you, maybe. Under your hypothesis, what is good for you would be bad for others—so unless your meta-ethical system privileges you, this line of argument doesn’t seem to follow.
Alonzo Fyfe and I are currently researching and writing a podcast on desirism, and we’ll eventually cover this topic. The most important thing to note right now is that desirism is set up as a theory that explains things very specific things: human moral concepts like negligence, excuse, mens rea, and a dozen other things. You can still take the foundational meta-ethical principles of desirism—which are certainly not unique to desirism—and come up with implications for FAI. But they may have little in common with the bulk of desirism that Alonzo usually talks about.
But I’m not trying to avoid your question. These days, I’m inclined to do meta-ethics without using moral terms at all. Moral terms are so confused, and carry such heavy connotational weights, that using moral terms is probably the worst way to talk about morality. I would rather just talk about reasons and motives and counterfactuals and utility functions and so on.
Leaving out ethical terms, what implications do my own meta-ethical views have for Friendly AI? I don’t know. I’m still catching up with the existing literature on Friendly AI.
Hard to explain. Alonzo Fyfe and I are currently developing a structured and technical presentation of the theory, so what you’re asking for is coming but may not be ready for many months. It’s a reasons-internalist view, and actually I’m not sure how much of the rest of it would be relevant to FAI.
I think Eliezer’s meta-ethics is wrong because it’s possible that we live in a world where Eliezer’s “right” doesn’t actually designate anything.
In what way? Since the idea hasn’t been given much technical clarity, even if it moves conceptual understanding a long way, it’s hard for me to imagine how one can arrive at confidence in a strong statement like that.
I’m not sure what you’re asking. Are you asking how it is possible that Eliezer’s “right” doesn’t designate anything, or how that implies Eliezer’s meta-ethics is wrong?
I’m asking (1) how is it possible that Eliezer’s “right” doesn’t designate anything, and (2) how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property (this is a meta-point possibly subsumed by the first point).
how is it possible that Eliezer’s “right” doesn’t designate anything
Eliezer identifies “right” with “the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent.” It is possible that human morality, when extrapolated, shows no coherence, in which case Eliezer’s “right” doesn’t designate anything.
how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property
Are you saying that Eliezer’s general approach might still turn out to be correct, if we substitute better definitions or understandings of “extrapolation” and/or “coherence”? If so, I agree, and I didn’t mean to exclude this possibility with my original statement. Should I have made it clearer when I said “I think Eliezer’s meta-ethics is wrong” that I meant “based on my understanding of Eliezer’s current ideas”?
It is possible that human morality, when extrapolated, shows no coherence
For example, I have no idea what this means. I don’t know what “extrapolated” means, apart from some vague intuitions, and even what “coherent” means.
Are you saying that Eliezer’s general approach might still turn out to be correct, if we substitute better definitions or understandings of “extrapolation” and/or “coherence”?
Better than what? I have no specific adequate candidates, only a direction of research.
It is possible that human morality, when extrapolated, shows no coherence
For example, I have no idea what this means.
Did you read the thread I linked to in my opening comment, where Marcello and I argued in more detail why we think that? Perhaps we can move the discussion there, so you can point out where you disagree with or not understand us?
To respond to that particular argument, which I don’t see how substantiates the point that morality according to Eliezer’s meta-ethics could be void.
When you’re considering what a human mind would conclude upon considering certain new arguments, you’re thinking of ways to improve it. A natural heuristic is to add opportunity for reflection, but obviously exposing one to “unbalanced” argument can lead a human mind anywhere. So you suggest a heuristic of looking for areas of “coherence” in conclusions reached upon exploration of different ways of reflecting.
But this “coherence” is also merely a heuristic. What you want is to improve the mind in the right way, not in coherent way, or balanced way. So you let the mind reflect on strategies for exposing itself to more reflection, and then on strategies for reflecting on reflecting on strategies for getting more reflection, and so on, in any way deemed appropriate by the current implementation. There’s probably no escaping this unguided stage, for the most right guide available is the agent itself (unfortunately).
What you end up with won’t have opportunity to “regret” past mistakes, for every regret is recognition of an error, and any error can be corrected (for the most part). What’s wrong with “incoherent” future growth? Does lack of coherence indicate a particular error, something not done right? If it does, that could be corrected. If it doesn’t, everything is fine.
(By the way, this argument could potentially place advanced human rationality and human understanding of decision theory and meta-ethics directly on track to a FAI, with the only way of making a FAI using a human (upload) group self-improvement.)
I believe that in Eliezer’s meta-ethics, both the extrapolation procedure and the coherence property are to be given fixed logical definitions as part of the meta-ethics, and are not just “heuristics” to be freely chosen by the subject being extrapolated. You seem to be describing your own ideas, which are perhaps similar enough to Eliezer’s to be said to fall under his general approach, but I don’t think can be said to be Eliezer’s meta-ethics.
making a FAI using a human (upload) group self-improvement
Seems like a reasonable idea, but again, almost surely not what Eliezer intended.
I believe that in Eliezer’s meta-ethics, both the extrapolation procedure and the coherence property are to be given fixed logical definitions as part of the meta-ethics, and are not just “heuristics” to be freely chosen by the subject being extrapolated.
Why “part of meta-ethics”? That would make sense as part of FAI design. Surely the details are not to be chosen “freely”, but still there’s only one criterion for anything, and that’s full morality. For any fixed logical definition, any element of any design, there’s a question of what could improve it, make the consequences better.
I think because Eliezer wanted to ensure a good chance that right_Eliezer and right_random_human turn out to be very similar. If you let each person choose how to extrapolate using their own current ideas, you’re almost certainly going to end up with very different extrapolated moralities.
The point is not that they’ll be different, but that mistakes will be made, making the result not quite right, or more likely not right at all. So on the early stage, one must be very careful, develop a reliable theory of how to proceed instead of just doing stuff at random, or rather according to current human heuristics.
Extended amount of reflection looks like one least invasive self-improvement technique, something that’s expected to make you more reliably right, especially if you’re given opportunity to decide how the process is to be set up. This could get us to the next stage, and so on. More invasive heuristics can prove too disruptive, wrong in unexpected and poorly-understood ways, so that one won’t be able to expect the right outcome without close oversight from a moral judgment, which we don’t have in any technically strong enough form as of yet.
Suppose you have the intuition that extended reflection and coherence are good heuristics to guide your extrapolation. I, on the other hand, think that extended reflection as a base human is dangerous, and coherence has nothing to do with what’s right. I’d rather that the extrapolated me experiment with self-modification after only a moderate amount of theorizing, and at the end merge with its counter-factual versions through acausal negotiation.
Suppose further that you end up in control of FAI design, and you want it to take my morality into account. Would you have it extrapolate me using your preferred method, or mine?
Suppose you have the intuition that extended reflection and coherence are good heuristics to guide your extrapolation. I, on the other hand, think that extended reflection as a base human is dangerous, and coherence has nothing to do with what’s right.
What these heuristics discuss are ways of using more resources. The resources themselves are heuristically assumed to be useful, and so we discuss how to use them best.
(Now to slip to an object-level argument for a change.)
Notice the “especially if you’re given opportunity to decide how the process is to be set up” in my comment. I agree that unnaturaly extended reflection is dangerous, we might even run into physiological problems with computations in the brains that are too chronologically old. But 50 years is better that 6 months, even if both 50 years and 6 months are dangerous. And if you actually work on planning these reflection sessions, so that you can set up groups of humans to work for some time, then maybe resetting them and only having them pass their writings to new humans, filtering such findings using not-older-than-50 humans trained on more and more improved findings and so on. For most points you could raise with the reason it’s dangerous, we could work on finding a solution for that problem. For any experiment with FAI design, we would be better off thinking about it first.
Likewise, if you task 1000 groups of humans to work on coming up with possible strategies for using the next batch of computational resources (not for doing most good explicitly, but for developing even better heuristic understanding of the problem), and you use the model of human research groups as having a risk of falling into reflective death spirals where all members of a group can fall to memetic infection that gives no answers to the question they considered, then it seems like a good heuristic to place considerably less weight on suggestions that come up very rarely and don’t get supported by some additional vetting process.
For example, the first batches of research could focus on developing effective training programs in rationality, then in social engineering, voting schemes, and so on. Overall architecture of future human-level meta-ethics necessary for more dramatic self-improvement (or improvement in the methods of having things done, such as using a non-human AI or science of deep non-human moral calculations) would come much later.
In short, I’m not talking of anything that opposes the strategies you named, so you’d need to point to incurable problems that make the strategy of thinking more about the problem lead to worse results than randomly making stuff up (sorry!).
and at the end merge with its counter-factual versions through acausal negotiation.
The current understanding of acausal control (which is a consideration from decision theory, which can in turn be seen as normative element of a meta-ethics, which is the same kind of consideration as “let’s reflect more”) is inadequate to place the weight of the future on a statement like this. We need to think more about decision theory, in particular, before making such decisions.
Suppose further that you end up in control of FAI design
What does it mean? If I can order a computer around, that doesn’t allow me to know what to do with it.
and you want it to take my morality into account. Would you have it extrapolate me using your preferred method, or mine?
I’d think about the problem more, or try implementing a reliable process for that if I can.
For example, I have no idea what this means. I don’t know what “extrapolated” means, apart from some vague intuitions, and even what “coherent” means.
It means, for instance, that segments of the population who have different ideas on controversial moral questions like abortion or capital punishment actually have different moralities and different sets of values, and that we as a species will never agree on what answers are right, regardless of how much debate or discussion or additional information we have. I strongly believe this to be true.
Clearly, I know all this stuff, so I meant something else. Like not having more precise understanding (that could also easily collapse this surface philosophizing).
Well, yes, I know you know all this stuff. Are you saying we can’t meaningfully discuss it unless we have a precise algorithmic definition of CEV? People’s desires and values are not that precise. I suspect we can only discuss it in vague terms until we come up with some sort of iterative procedure that fits our intuition of what CEV should be, at which point we’ll have to operationally define CEV as that procedure.
Projecting that answer onto my question I get something like “Because ethical systems in which “right” has an actual referent are better, for unspecified reasons, than ones in which it doesn’t, and Wei Dai’s current unextrapolated preferences involve an actual though unspecified referent for “right,” so we can at the very least reject all systems where “right” doesn’t designate anything actual in favor of the system Wei Dai’s current unextrapolated preferences implement, even if nothing better ever comes along.”
I can make the practical case: If “right” refers to nothing, and we design an FAI to do what is right, then it will do nothing. We want the FAI to do something instead of nothing, so “right” having a referent is important.
Or the philosophical case: If “right” refers to nothing, then “it’s right for me to save that child” would be equivalent to the null sentence. From introspection I think I must mean something non-empty when I say something like that.
(sigh) Sure, agreed… if our intention is to build an FAI to do what is right, it’s important that “what is right” mean something. And I could ask why we should build an FAI that way, and you could tell me that that’s what it means to be Friendly, and on and on.
I’m not trying to be pedantic here, but this does seem sort of pointlessly circular… a discussion about words rather than things.
When a Jewish theist says “God has commanded me to save that child,” they may be entirely sincere, but that doesn’t in and of itself constitute evidence that “God” has a referent, let alone that the referent of “God” (supposing it exists) actually so commanded them.
When you say “It’s right for me to save that child,” the situation may be different, but the mere fact that you can utter that sentence with sincerity doesn’t constitute evidence of difference.
If we really want to save children, I would say we should talk about how most effectively to save children, and design our systems to save children, and that talking about whether God commanded us to save children or whether it’s right to save children adds nothing of value to the process.
More generally, if we actually knew everything we wanted, as individuals and groups, then we could talk about how most effectively to achieve that and design our FAIs to achieve that and discussions about whether it’s right would seem as extraneous as discussions about discussions about whether it’s God-willed.
The problem is that we don’t know what we want. So we attach labels to that-thing-we-don’t-understand, and over time those labels adopt all kinds of connotations that make discussion difficult. The analogy to theism applies here as well.
At some point, it becomes useful to discard those labels.
A CEV-implementing FAI, supposing such a thing is possible, will do what we collectively want done, whatever that turns out to be. A FAI implementing some other strategy will do something else. Whether those things are right is just as useless to talk about as whether they are God’s will; those terms add nothing to the conversation.
TheOtherDave, I don’t really want to argue about whether talking about “right” adds value. I suspect it might (i.e., I’m not so confident as you that it doesn’t), but mainly I was trying to argue with Eliezer on his own terms. I do want to correct this:
A CEV-implementing FAI, supposing such a thing is possible, will do what we collectively want done, whatever that turns out to be.
CEV will not do “what we collectively want done”, it will do what’s “right” according to Eliezer’s meta-ethics, which is whatever is coherent amongst the volitions it extrapolates from humanity, which as others and I have argued, might turn out to be “nothing”. If you’re proposing that we build an AI that does do “what we collectively want done”, you’d have to define what that means first.
I don’t really want to argue about whether talking about “right” adds value.
OK. The question I started out with, way at the top of the chain, was precisely about why having a referent for “right” was important, so I will drop that question and everything that descends from it.
As for your correction, I actually don’t understand the distinction you’re drawing, but in any case I agree with you that it might turn out that human volition lacks a coherent core of any significance.
To me, “what we collectively want done” means somehow aggregating (for example, through voting or bargaining) our current preferences. It lacks the elements of extrapolation and coherence that are central to CEV.
What is the source of criteria such as voting or bargaining that you suggest? Why polling everyone and not polling every prime-indexed citizen instead? It’s always your judgment about what is the right thing to do.
I think Eliezer’s meta-ethics is wrong because it’s possible that we live in a world where Eliezer’s “right” doesn’t actually designate anything. That is, where a typical human’s morality, when extrapolated, fails to be coherent. “Right” should still mean something in a world like that, but it doesn’t under Eliezer’s theory.
Also, to jump the gun a bit, your own meta-ethics, desirism, says:
What does this mean in the FAI context? To a super-intelligent AI, it’s own desires, as well as those of everyone else on Earth, can be considered “malleable”, in the sense that it can change all of them if it wanted to. But there might be some other super-intelligent AIs (created by aliens) whose desires it is powerless to change. I hope desirism doesn’t imply that it should change my desires so as to fulfill the alien AIs’ desires...
What should it mean in a world like that?
I haven’t found a satisfactory meta-ethics yet, so I still don’t know. But whatever the answer is, it has to be at least as good as “my current (unextrapolated) preferences”. “Nothing” is worse than that, so it can’t be the correct answer.
This is actually a useful way of looking at what metaethics (decision theory) is: tools for self-improvement, explaining specific ways in which correctness of actions (or correctness of other tools of the same kind) can be judged. In this sense, useless metaethics is one that doesn’t help you with determining what should be done, and wrong metaethics is one that’s actively stupid, suggesting you to do things that you clearly shouldn’t (for FAI based on that metaethics, correspondingly doing things that it shouldn’t).
In this sense, the injunction of doing nothing in response to failed assumptions (i.e. no coherence actually present) in CEV is not stupid, since your own non-extrapolated mind is all you’ll end up with in case CEV shuts down. It is a contingency plan for the case it turns out to be useless.
(I find new obvious things everywhere after the recent realization that any explicit consideration an agent knows is subject to whole agent’s judgment, even “preference” or “logical correctness”. This also explains a bit of our talking past each other in the other thread.)
I don’t have much idea what you mean here. This seems important enough to write up as more than a parenthetical remark.
I spent a lot of time laboring under the intuition that there’s some “preference” thingie that summarizes all we care about, that we can “extract” from (define using a reference to) people and have an AI optimize it. In the lingo of meta-ethics, that would be “right” or “morality”, and it distanced itself from the overly specific “utility” that also has the disadvantage of forgetting that prior is essential.
Then, over the last few months, as I was capitalizing on finally understanding UDT in May 2010 (despite having convinced a lot of people that I understood it long before that, I completely failed to get the essential aspect of controlling the referents of fixed definitions, and only recognized in retrospect that what I figured out by that time was actually UDT), I noticed that a decision problem requires many more essential parts than just preference, and so to specify what people care about, we need a whole human decision problem. But the intuition that linked to preference in particular, which was by then merely a part of the decision problem, still lingered, and so I failed to notice that now not preference, but the whole decision problem, is analogous to “right” and “morality” (but not quite, since that decision problem still won’t be the definition of right, it can be judged in turn), and the whole agent that implements such decision problem is the best tool available to judge them.
This agent, in particular, can find itself judging its own preference, or its own inference system, or its whole architecture that might or might not specify an explicit inference system as its part, and so on. Whatever explicit consideration it’s moved by, that is whatever module in the agent (decision problem) it considers, there’s a decision problem of self-improvement where the agent replaces that module with something else, and things other than that module can have a hand in deciding.
Also, there’s little point in distinguishing “decision problem” and “agent”, even though there is a point in distinguishing a decision problem and what’s right. Decision problem is merely a set of tricks that the agent is willing to use, as is agent’s own implementation. What that set of tricks wants to do is not specified in any of the tricks, and the tricks can well fail the agent.
When we apply these considerations to humans, it follows that no human can know what they care about, they can only guess (and, indeed, design) heuristic rules for figuring out what they care about, and the same applies to when they construct FAIs. So extracting “preference” exactly is not possible, instead FAI should be seen as a heuristic, that would still be subject to moral judgment and probably won’t capture it whole, just as humans themselves don’t implement what’s right reliably. Recognizing that FAI won’t be perfect, and that things it does are merely ways of more reliably doing the right thing, looks like an important intuition.
(This is apparently very sketchy and I don’t expect it to get significantly better for at least a few months. I could talk more (thus describing more of the intuition), but not clearer, because I don’t understand this well myself. An alternative would have me write up some unfinished work that would clarify each particular intuition, but would be likely of no lasting value, and so should wait for a better rendition instead.)
This sounds weird, like you’ve driven off a cliff or something. A human mind is a computer of finite complexity. If you feed it a complete description of itself, it will know what it cares about, up to logical uncertainty which may or may not be reduced by applying powerful math. Or do I misunderstand you? Maybe the following two questions will help clarify things:
a) Can a paperclipper know what it cares about?
b) How is a human fundamentally different from a paperclipper with respect to (a)?
Hence “explicit considerations”, that is not up to logical uncertainty. Also, you need to know that you care about logic to talk of “up to logical uncertainty” as getting you closer to what you want.
Similarly (unhelpfully), everyone knows what they should do up to moral uncertainty.
No, at least while it’s still an agent in the same sense, so that it still has the problem of self-improvement on its hands, and hasn’t disassembled itself into actual paperclips. For a human, its philosophy of precise reasoning about paperclips won’t look like an adequate activity to spend resources on, but for the paperclipper, understanding paperclips really well is important.
OK, how about this: do you think an AI tasked with proving the Goldbach conjecture from the axioms of ZFC will find itself similarly confused about morality? I doubt it.
ETA:
I defy the possibility that we may “not care about logic” in the sense that you suggest.
(Not “morality” here, of course, but its counterpart in the analogy.)
What is to guide its self-improvement? How is it to best convert the Sun into more computing machinery, in the face of logical uncertainty about consequences of such an action? What is meant by “actually proving it”? Does quantum suicide count as a method for achieving its goal? When should it risk performing an action in the environment, given that it could damage its own hardware as a result? When should it risk improving its inference system, given that there’s a risk that this improvement will turn out to increase the time necessary to perform the proof, perhaps even eventually leading to moving this time outside what’s physically available in our universe? Heuristics everywhere, no easy methods for deciding what should be done.
In a decision between what’s logical and what’s right, you ought to choose what’s right.
If you can summarize your reasons for thinking that’s actually a conflict that can arise for me, I’d be very interested in them.
Consider a possible self-improvement that changes your inference system in such a way that it (1) becomes significantly more efficient at inferring the kinds of facts that help you with making right decisions, and (2) obtains an additional tiny chance of being inconsistent. If all you care about is correctness, then notice that implementing this self-improvement will make you less correct, will increase the probability that you’ll produce incorrect inferences in the future. On the other hand, expected utility of this decision argues that you should take it. This is a conflict, resolved either by self-improving or not.
That’s fair. Yes, agreed that this is a decision between maximizing my odds of being logical and maximizing my odds of being right, which is a legitimate example of the conflict you implied. And I guess I agree that if being right has high utility then it’s best to choose what’s right.
Thanks.
Seeking high utility is right (and following rules of logic is right), not the other way around. “Right” is the unreachable standard by which things should be, which “utility” is merely a heuristic for representation of.
It isn’t clear to me what that statement, or its negation, actually implies about the world. But I certainly don’t think it’s false.
I’m generally sympathetic towards these intuitions, but I have a few reservations:
Isn’t it possible that it only looks like “heuristics all the way down” because we haven’t dug deep enough yet? Perhaps in the not too distant future, someone will come up with some insights that will make everything clear, and we can just implement that.
What is the nature of morality according to your approach? You say that a human can’t know what they care about (which I assume you use interchangeably with “right”, correct me if I’m wrong here). Is it because they can’t, in principle, fully unfold the logical definition of right, or is it that they can’t even define “right” in any precise way?
This part assumes that your answer to the last question is “the latter”. Usually when someone says “heuristic” they have a fully precise theory or problem statement that the heuristic is supposed to be an approximate solution to. How is an agent supposed to design a set of heuristics without a such a precise definition to guide it? Also, if the agent itself uses the words “morality” or “right”, what do they refer to?
If the answer to the question in 2 is “the former”, do you have any idea what the precise definition of “right” looks like?
Everything’s possible, but doesn’t seem plausible at this point, and certainly not at human level. To conclude that something is not a heuristic, but the thing itself, one would need too much certainty to be expected of such a question.
I did use that interchangeably.
Both (the latter). Having an explicit definition would correspond to “preference” which I discussed in the grandparent comment. But when we talk of merely “precise”, at least in principle we could hope to obtain a significantly more precise description, maybe even on human level, which is what meta-ethics should strive to give us. Every useful heuristic is an element of such a description, and some of the heuristics, such as laws of physics, are very precise.
The current heuristics, its current implementation, which is understood to be fallible.
Don’t know (knowing would give a definition). To the extent it’s known, see the current heuristics (long list), maybe brains.
Essentially, what you’re describing is just the situation that we are actually faced with. I mean, when I use the word “right” I think I mean something but I don’t know what. And I have to use my current heuristics, my current implementation without having a precise theory to guide me.
And you’re saying that this situation is unlikely to change significantly by the time we build an FAI, so the best we can expect to do is equivalent to a group of uploads improving themselves to the best of their abilities.
I tend to agree with this (although I think I assign a higher probability that someone does make a breakthrough than you perhaps do), but it doesn’t really constitute a meta-ethics, at least not in the sense that Eliezer and philosophers use that word.
I’m glad it all adds up to normality, given the amount of ink I spilled getting to this point.
Not necessarily. The uploads construct could in principle be made abstract, with efficient algorithms figuring out the result of the process much quickly than if it’s actually simulated. More specific heuristics could be figured out that make use of computational resources to make better progress, maybe on early stages by the uploads construct.
I’m not sure about that. If it’s indeed all we can say about morality right now, then that’s what we have to say, even if it doesn’t belong to the expected literary genre. It’s too easy to invent fake explanations, and absence of conclusions invites that, where a negative conclusion could focus the effort elsewhere.
(Also, I don’t remember particular points on which my current view disagrees with Eliezer’s sequence, although I’d need to re-read it to have a better idea, which I really should, since I only read it as it was posted, when my understanding of the area was zilch.)
I second this request. In particular, please clarify whether “preference” and “logical correctness” are presented here as examples of “explicit considerations”. And whether whole agent should be parsed as including all the sub-agents? Or perhaps as extrapolated agent?
Perhaps he’s refering to the part of CEV that says “extrapolated as we wish that extrapolated, interpreted as we wish that interpreted”. Even logical coherence becomes in this way a focus of extrapolation dynamics, and if this criterion should be changed to something else—as judged by the whole of our extrapolated morality in a strange-loopy way—well, so be it. The dynamics should reflect on itself and consider the foundational assumptions it was built upon, including the compelingness of basic logic we are currently so certain about—and of course, if it really should reflect on itself in this way.
Anyway, I’d really like to hear what Vladimir has to say about this. Even though it’s often quite hard for me to parse his writings, he does seem to clear things up for me or at least direct my attention towards some new, unexplored areas...
...and continuing from the other comment, the problem here is that one meta-ethical conclusion seems to be that no meta-ethics can actually define what “right” is. So any meta-ethics would only pour a limited amount of light on the question, and is expected to have failure modes, where the structure of the theory is not quite right. It’s a virtue of a meta-ethical theory to point out explicitly some of its assumptions, which, if not right, would make the advice it gives incorrect. In this case, we have an assumption of reflective coherence in human value, and a meta-ethics that said that if it’s not so, then it doesn’t know anything. I’m pretty sure that Eliezer would disagree with the assertion that if any given meta-ethics, including some version of his own, would state that the notion of “right” is empty, then “right” is indeed empty (see “the moral void”).
Better—for you, maybe. Under your hypothesis, what is good for you would be bad for others—so unless your meta-ethical system privileges you, this line of argument doesn’t seem to follow.
Wei_Dai,
Alonzo Fyfe and I are currently researching and writing a podcast on desirism, and we’ll eventually cover this topic. The most important thing to note right now is that desirism is set up as a theory that explains things very specific things: human moral concepts like negligence, excuse, mens rea, and a dozen other things. You can still take the foundational meta-ethical principles of desirism—which are certainly not unique to desirism—and come up with implications for FAI. But they may have little in common with the bulk of desirism that Alonzo usually talks about.
But I’m not trying to avoid your question. These days, I’m inclined to do meta-ethics without using moral terms at all. Moral terms are so confused, and carry such heavy connotational weights, that using moral terms is probably the worst way to talk about morality. I would rather just talk about reasons and motives and counterfactuals and utility functions and so on.
Leaving out ethical terms, what implications do my own meta-ethical views have for Friendly AI? I don’t know. I’m still catching up with the existing literature on Friendly AI.
What are the foundational meta-ethical principles of desirism? Do you have a link?
Hard to explain. Alonzo Fyfe and I are currently developing a structured and technical presentation of the theory, so what you’re asking for is coming but may not be ready for many months. It’s a reasons-internalist view, and actually I’m not sure how much of the rest of it would be relevant to FAI.
In what way? Since the idea hasn’t been given much technical clarity, even if it moves conceptual understanding a long way, it’s hard for me to imagine how one can arrive at confidence in a strong statement like that.
I’m not sure what you’re asking. Are you asking how it is possible that Eliezer’s “right” doesn’t designate anything, or how that implies Eliezer’s meta-ethics is wrong?
I’m asking (1) how is it possible that Eliezer’s “right” doesn’t designate anything, and (2) how could you arrive at such a strong conclusion based on his non-technical writings, since he could just mean something different, or could have insufficient precision in his own idea to determine this property (this is a meta-point possibly subsumed by the first point).
Eliezer identifies “right” with “the ideal morality that I would have if I heard all the arguments, to whatever extent such an extrapolation is coherent.” It is possible that human morality, when extrapolated, shows no coherence, in which case Eliezer’s “right” doesn’t designate anything.
Are you saying that Eliezer’s general approach might still turn out to be correct, if we substitute better definitions or understandings of “extrapolation” and/or “coherence”? If so, I agree, and I didn’t mean to exclude this possibility with my original statement. Should I have made it clearer when I said “I think Eliezer’s meta-ethics is wrong” that I meant “based on my understanding of Eliezer’s current ideas”?
For example, I have no idea what this means. I don’t know what “extrapolated” means, apart from some vague intuitions, and even what “coherent” means.
Better than what? I have no specific adequate candidates, only a direction of research.
Did you read the thread I linked to in my opening comment, where Marcello and I argued in more detail why we think that? Perhaps we can move the discussion there, so you can point out where you disagree with or not understand us?
To respond to that particular argument, which I don’t see how substantiates the point that morality according to Eliezer’s meta-ethics could be void.
When you’re considering what a human mind would conclude upon considering certain new arguments, you’re thinking of ways to improve it. A natural heuristic is to add opportunity for reflection, but obviously exposing one to “unbalanced” argument can lead a human mind anywhere. So you suggest a heuristic of looking for areas of “coherence” in conclusions reached upon exploration of different ways of reflecting.
But this “coherence” is also merely a heuristic. What you want is to improve the mind in the right way, not in coherent way, or balanced way. So you let the mind reflect on strategies for exposing itself to more reflection, and then on strategies for reflecting on reflecting on strategies for getting more reflection, and so on, in any way deemed appropriate by the current implementation. There’s probably no escaping this unguided stage, for the most right guide available is the agent itself (unfortunately).
What you end up with won’t have opportunity to “regret” past mistakes, for every regret is recognition of an error, and any error can be corrected (for the most part). What’s wrong with “incoherent” future growth? Does lack of coherence indicate a particular error, something not done right? If it does, that could be corrected. If it doesn’t, everything is fine.
(By the way, this argument could potentially place advanced human rationality and human understanding of decision theory and meta-ethics directly on track to a FAI, with the only way of making a FAI using a human (upload) group self-improvement.)
I believe that in Eliezer’s meta-ethics, both the extrapolation procedure and the coherence property are to be given fixed logical definitions as part of the meta-ethics, and are not just “heuristics” to be freely chosen by the subject being extrapolated. You seem to be describing your own ideas, which are perhaps similar enough to Eliezer’s to be said to fall under his general approach, but I don’t think can be said to be Eliezer’s meta-ethics.
Seems like a reasonable idea, but again, almost surely not what Eliezer intended.
Why “part of meta-ethics”? That would make sense as part of FAI design. Surely the details are not to be chosen “freely”, but still there’s only one criterion for anything, and that’s full morality. For any fixed logical definition, any element of any design, there’s a question of what could improve it, make the consequences better.
I think because Eliezer wanted to ensure a good chance that right_Eliezer and right_random_human turn out to be very similar. If you let each person choose how to extrapolate using their own current ideas, you’re almost certainly going to end up with very different extrapolated moralities.
The point is not that they’ll be different, but that mistakes will be made, making the result not quite right, or more likely not right at all. So on the early stage, one must be very careful, develop a reliable theory of how to proceed instead of just doing stuff at random, or rather according to current human heuristics.
Extended amount of reflection looks like one least invasive self-improvement technique, something that’s expected to make you more reliably right, especially if you’re given opportunity to decide how the process is to be set up. This could get us to the next stage, and so on. More invasive heuristics can prove too disruptive, wrong in unexpected and poorly-understood ways, so that one won’t be able to expect the right outcome without close oversight from a moral judgment, which we don’t have in any technically strong enough form as of yet.
Suppose you have the intuition that extended reflection and coherence are good heuristics to guide your extrapolation. I, on the other hand, think that extended reflection as a base human is dangerous, and coherence has nothing to do with what’s right. I’d rather that the extrapolated me experiment with self-modification after only a moderate amount of theorizing, and at the end merge with its counter-factual versions through acausal negotiation.
Suppose further that you end up in control of FAI design, and you want it to take my morality into account. Would you have it extrapolate me using your preferred method, or mine?
What these heuristics discuss are ways of using more resources. The resources themselves are heuristically assumed to be useful, and so we discuss how to use them best.
(Now to slip to an object-level argument for a change.)
Notice the “especially if you’re given opportunity to decide how the process is to be set up” in my comment. I agree that unnaturaly extended reflection is dangerous, we might even run into physiological problems with computations in the brains that are too chronologically old. But 50 years is better that 6 months, even if both 50 years and 6 months are dangerous. And if you actually work on planning these reflection sessions, so that you can set up groups of humans to work for some time, then maybe resetting them and only having them pass their writings to new humans, filtering such findings using not-older-than-50 humans trained on more and more improved findings and so on. For most points you could raise with the reason it’s dangerous, we could work on finding a solution for that problem. For any experiment with FAI design, we would be better off thinking about it first.
Likewise, if you task 1000 groups of humans to work on coming up with possible strategies for using the next batch of computational resources (not for doing most good explicitly, but for developing even better heuristic understanding of the problem), and you use the model of human research groups as having a risk of falling into reflective death spirals where all members of a group can fall to memetic infection that gives no answers to the question they considered, then it seems like a good heuristic to place considerably less weight on suggestions that come up very rarely and don’t get supported by some additional vetting process.
For example, the first batches of research could focus on developing effective training programs in rationality, then in social engineering, voting schemes, and so on. Overall architecture of future human-level meta-ethics necessary for more dramatic self-improvement (or improvement in the methods of having things done, such as using a non-human AI or science of deep non-human moral calculations) would come much later.
In short, I’m not talking of anything that opposes the strategies you named, so you’d need to point to incurable problems that make the strategy of thinking more about the problem lead to worse results than randomly making stuff up (sorry!).
The current understanding of acausal control (which is a consideration from decision theory, which can in turn be seen as normative element of a meta-ethics, which is the same kind of consideration as “let’s reflect more”) is inadequate to place the weight of the future on a statement like this. We need to think more about decision theory, in particular, before making such decisions.
What does it mean? If I can order a computer around, that doesn’t allow me to know what to do with it.
I’d think about the problem more, or try implementing a reliable process for that if I can.
It means, for instance, that segments of the population who have different ideas on controversial moral questions like abortion or capital punishment actually have different moralities and different sets of values, and that we as a species will never agree on what answers are right, regardless of how much debate or discussion or additional information we have. I strongly believe this to be true.
Clearly, I know all this stuff, so I meant something else. Like not having more precise understanding (that could also easily collapse this surface philosophizing).
Well, yes, I know you know all this stuff. Are you saying we can’t meaningfully discuss it unless we have a precise algorithmic definition of CEV? People’s desires and values are not that precise. I suspect we can only discuss it in vague terms until we come up with some sort of iterative procedure that fits our intuition of what CEV should be, at which point we’ll have to operationally define CEV as that procedure.
So if a system of ethics entails that “right” doesn’t designate anything actual, you reject that system. Can you say more about why?
Does my answer to Eliezer answer your question as well?
I’m not sure.
Projecting that answer onto my question I get something like “Because ethical systems in which “right” has an actual referent are better, for unspecified reasons, than ones in which it doesn’t, and Wei Dai’s current unextrapolated preferences involve an actual though unspecified referent for “right,” so we can at the very least reject all systems where “right” doesn’t designate anything actual in favor of the system Wei Dai’s current unextrapolated preferences implement, even if nothing better ever comes along.”
Is that close enough to your answer?
Yes, close enough.
In that case, not really… what I was actually curious about is why “right” having a referent is important.
I can make the practical case: If “right” refers to nothing, and we design an FAI to do what is right, then it will do nothing. We want the FAI to do something instead of nothing, so “right” having a referent is important.
Or the philosophical case: If “right” refers to nothing, then “it’s right for me to save that child” would be equivalent to the null sentence. From introspection I think I must mean something non-empty when I say something like that.
Do either of these answer your question?
Congratulations, you just solved the Fermi paradox.
(sigh) Sure, agreed… if our intention is to build an FAI to do what is right, it’s important that “what is right” mean something. And I could ask why we should build an FAI that way, and you could tell me that that’s what it means to be Friendly, and on and on.
I’m not trying to be pedantic here, but this does seem sort of pointlessly circular… a discussion about words rather than things.
When a Jewish theist says “God has commanded me to save that child,” they may be entirely sincere, but that doesn’t in and of itself constitute evidence that “God” has a referent, let alone that the referent of “God” (supposing it exists) actually so commanded them.
When you say “It’s right for me to save that child,” the situation may be different, but the mere fact that you can utter that sentence with sincerity doesn’t constitute evidence of difference.
If we really want to save children, I would say we should talk about how most effectively to save children, and design our systems to save children, and that talking about whether God commanded us to save children or whether it’s right to save children adds nothing of value to the process.
More generally, if we actually knew everything we wanted, as individuals and groups, then we could talk about how most effectively to achieve that and design our FAIs to achieve that and discussions about whether it’s right would seem as extraneous as discussions about discussions about whether it’s God-willed.
The problem is that we don’t know what we want. So we attach labels to that-thing-we-don’t-understand, and over time those labels adopt all kinds of connotations that make discussion difficult. The analogy to theism applies here as well.
At some point, it becomes useful to discard those labels.
A CEV-implementing FAI, supposing such a thing is possible, will do what we collectively want done, whatever that turns out to be. A FAI implementing some other strategy will do something else. Whether those things are right is just as useless to talk about as whether they are God’s will; those terms add nothing to the conversation.
TheOtherDave, I don’t really want to argue about whether talking about “right” adds value. I suspect it might (i.e., I’m not so confident as you that it doesn’t), but mainly I was trying to argue with Eliezer on his own terms. I do want to correct this:
CEV will not do “what we collectively want done”, it will do what’s “right” according to Eliezer’s meta-ethics, which is whatever is coherent amongst the volitions it extrapolates from humanity, which as others and I have argued, might turn out to be “nothing”. If you’re proposing that we build an AI that does do “what we collectively want done”, you’d have to define what that means first.
OK. The question I started out with, way at the top of the chain, was precisely about why having a referent for “right” was important, so I will drop that question and everything that descends from it.
As for your correction, I actually don’t understand the distinction you’re drawing, but in any case I agree with you that it might turn out that human volition lacks a coherent core of any significance.
To me, “what we collectively want done” means somehow aggregating (for example, through voting or bargaining) our current preferences. It lacks the elements of extrapolation and coherence that are central to CEV.
Gotcha… that makes sense. Thanks for clarifying.
What is the source of criteria such as voting or bargaining that you suggest? Why polling everyone and not polling every prime-indexed citizen instead? It’s always your judgment about what is the right thing to do.