It’s the opposite of the lesson I usually try to teach, but in this one case I’ll say it: it’s not the world that’s mad, it’s you.
I don’t think he is “mad”, at least not if you press him enough. A few weeks ago I posted the following comment on one of his Facebook submissions:
Will, this off-topic, I’m curious. What would you do if 1.) any action would be ethically indifferent 2.) expected utility hypothesis was bunk 3.) all that really counted was what you want based on naive introspection?
I’m asking because you (and others) seem to increasingly lose yourself in logical implications of maximizing expected utility and ethical considerations.
Take care that you don’t confuse squiggles on paper with reality.
His reply (emphasis mine):
Alexander, I don’t think that’s a particularly good model of my actual reasoning. The simple arguments I have for thinking about what I think about don’t involve Pascalian reasoning or conjunctions of weird beliefs, and when it comes to policy I am one of the most vocal critics on LW of the unfortunate trend where otherwise smart people attempt to implement complicated policies due to the output of some incredibly brittle model, often without even taking into account opportunity costs or even considering any obviously better meta-level policies. That is insanity, and completely unrelated to any of the kinds of thinking that I do.
The reasons for my current obsessions are pretty simple, though it’s worth noting that I am intentionally keeping my options very, very open.
Seed AI appears to be very possible to engineer. “Provably”-FAI isn’t obviously possible to engineer given potential time constraints. If we could make a seed AI that was reflective enough, for example due strong founding in what Steve Rayhawk wants from a “Creatorless Decision Theory”, and we had strong arguments about attractors that such an agent might fall into, and we had reason to believe that it might converge on something like FAI, then there might come a time when we should launch such a seed AI, even without all the proofs—for example due to being in a politically or existentially volatile situation.
Between BigNum-maximizer Goedel machine-like foomers and provably-FAI foomers, there’s a long continuum of AIs that are more or less reflective on the source of their utility function and what it means that some things rather than some other things caused that particular utility function to be there rather than some other one. The typical SingInst argument that a given AGI will be some kind of strict literalist with respect to what it thinks is its utility function is simply not very strong. In fact, it even contradicts Omohundro’s Basic AI Drives paper, which briefly addresses the topic: “For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated to reflect on their goals and to make them explicit.” Some small amount of reflection would seem to open the door for arbitrarily large amounts of reflection, especially if the AI is simultaneously modifying its decision theory—obviously we’d rather avoid an argument of degree where unchained intuitions are allowed to run amok.
We can make the debate more technical by looking at Goedel machines and program semantics. I have some relevant ideas but perhaps Schmidhueber’s talk about some Goedel machine implementations in a few days at AGI2011 will prove enlightening.
I’m already losing steam, so we’ll just call that Part One. Part Two and maybe a Part Three will talk about: decision theories upon self-modification; decision theory in context; abstract models of optimization & morality; timeless control and game theory of the big red button; and probably other miscellaneous related ideas.
But after all that I don’t really know how to answer your question. Wants… Even if somehow the thousand aversions that are shoulds were no longer supposed to compel me, they’d still be there, and I’d still be motivationally paralyzed, or whatever it is I am. I’d probably do the exact same things I’m doing now: living in Berkeley with my girlfriend, eating good food, regularly visiting some of the coolest people on Earth to talk about some of the most interesting ideas in all of history. All of that sounds pretty optimal as far as living on a budget of zero dollars goes. If the aversions were lifted, but I was still me, then I haven’t a good idea what I’d do. I’d be happy to immerse myself in the visual arts community, perhaps, or if I thought I could be brilliant I’d revolutionize music cognition and write by far the best artificial composer algorithms. I’d go to various excellent universities for a year or two, and if somehow I found an easy way to make money along the way, e.g. with occasional programming jobs, then I’d frequently travel to Europe and then Asia. I imagine I’d spent very many months in Germany, especially Bavaria. Walking along green mountains or resting under trees in meadow orchards, ideally with a MacBook Pro and a drawing tablet handy. I’d do much meditation and probably progress very quickly, and at some point I expect I’d develop a sort of self-refuge. But I don’t know, I’m just saying things that sound nice as if can’t have, and I may very well end up doing most of them no matter what future I lead.
It seems to me that he’s still with the rest of humanity when it comes to what he is doing on a daily basis and his underlying desires.
I don’t think he is “mad”, at least not if you press him enough.
(You argue that the madness in question, if present, is compartmentalized. The intended sense of “madness” (normal use on LW) includes the case of compartmentalized madness, so your argument doesn’t seem to disagree with Eliezer’s position.)
“For one thing, it has to make those objectives clear to itself. If its objectives are only implicit in the structure of a complex circuit or program, then future modifications are unlikely to preserve them. Systems will therefore be motivated”
Hold on. Motivated by what? If its objectives are only implicit in the structure, then why would these objectives include their self-preservation?
I don’t think he is “mad”, at least not if you press him enough. A few weeks ago I posted the following comment on one of his Facebook submissions:
His reply (emphasis mine):
It seems to me that he’s still with the rest of humanity when it comes to what he is doing on a daily basis and his underlying desires.
(You argue that the madness in question, if present, is compartmentalized. The intended sense of “madness” (normal use on LW) includes the case of compartmentalized madness, so your argument doesn’t seem to disagree with Eliezer’s position.)
((For those who haven’t seen it yet: http://lesswrong.com/lw/2q6/compartmentalization_in_epistemic_and/ ))
Belatedly.
Hold on. Motivated by what? If its objectives are only implicit in the structure, then why would these objectives include their self-preservation?