Is there a writeup somewhere of why we should expect an unaltered me to endorse the proposals that would, if implemented, best instantiate our coherent extrapolated volition?
I’m having a very hard time seeing why I should expect that; it seems to assume a level of consistency between what I endorse, what I desire, and what I “actually want” (in the CEV sense) that just doesn’t seem true of humans.
I guess the simplest thing I can say is: there’s a lot of stuff we don’t think of because our hypothesis space consists only of things we’ve seen before. We expect that an AGI, being more intelligent than any individual human, could afford a larger hypothesis space and sift it better, which is why it would be capable of coming up with courses of action we value highly but did not, ourselves, invent.
Think retrospectively: nobody living 10,000 years ago would have predicted the existence of bread, beer, baseball, or automobiles. And yet, modern humans find ways to like all of those things (except baseball ;-)).
All else failing, something like CEV or another form of indirect normativity should at least give us an AI Friendly enough that we can try to use an injunction architecture to restrict it to following our orders or something, and it will want to follow the intent behind the orders.
If you’re this skeptical about CEV, would you like to correspond by email about an alternative FAI approach under development, called value learners? I’ve been putting some tiny bit of thought into them on the occasional Saturday. I can send you the Google Doc of my notes.
Well, I certainly agree that there’s lots of things we don’t think about, and that a sufficiently intelligent system can come up with courses of action that humans will endorse, and that humans will like all kinds of things that they would not have endorsed ahead of time… for that matter, humans like all kinds of things that they simultaneously don’t endorse.
And no, not really interested in private discussion of alternate FAI approaches, though if you made a post about it I’d probably read it.
a sufficiently intelligent system can come up with courses of action that humans will endorse, and that humans will like all kinds of things that they would not have endorsed ahead of time… for that matter, humans like all kinds of things that they simultaneously don’t endorse.
Generally we aim to come up with things humans will both like and endorse. Optimizing for “like” but not “endorse” leads to various forms of drugging or wireheading (even if Eliezer does disturb me by being tempted towards such things). Optimizing for “endorse” but not “like” sounds like carrying the dystopia we currently call “real life” to its logical, horrid conclusion.
if you made a post about it I’d probably read it.
How well-founded does a set of notes or thoughts have to be in order to be worth posting here?
we aim to come up with things humans will both like and endorse
(shrug) Well, OK. If I consider the set of plans A which maximize our values when implemented, and the set of plans B which we endorse when they’re explained to us, I’m prepared to believe that the AB intersection is nonempty. And really, any technique that stands a chance worth considering of coming up with anything in A is sufficiently outside my experience that I won’t express an opinion about whether it’s noticably less likely to come up with something in AB. So, go for it, I guess.
How well-founded does a set of notes or thoughts have to be in order to be worth posting here?
Depends on whom you ask. I’d say it’s the product of (novel relevant concise entertaining coherent) that gets compared to threshold; well-founded is a nice benny but not critical. That said, posts that don’t make the threshold will frequently be berated for being ill-founded if they are.
That’s… interesting.
Is there a writeup somewhere of why we should expect an unaltered me to endorse the proposals that would, if implemented, best instantiate our coherent extrapolated volition?
I’m having a very hard time seeing why I should expect that; it seems to assume a level of consistency between what I endorse, what I desire, and what I “actually want” (in the CEV sense) that just doesn’t seem true of humans.
I guess the simplest thing I can say is: there’s a lot of stuff we don’t think of because our hypothesis space consists only of things we’ve seen before. We expect that an AGI, being more intelligent than any individual human, could afford a larger hypothesis space and sift it better, which is why it would be capable of coming up with courses of action we value highly but did not, ourselves, invent.
Think retrospectively: nobody living 10,000 years ago would have predicted the existence of bread, beer, baseball, or automobiles. And yet, modern humans find ways to like all of those things (except baseball ;-)).
All else failing, something like CEV or another form of indirect normativity should at least give us an AI Friendly enough that we can try to use an injunction architecture to restrict it to following our orders or something, and it will want to follow the intent behind the orders.
If you’re this skeptical about CEV, would you like to correspond by email about an alternative FAI approach under development, called value learners? I’ve been putting some tiny bit of thought into them on the occasional Saturday. I can send you the Google Doc of my notes.
Well, I certainly agree that there’s lots of things we don’t think about, and that a sufficiently intelligent system can come up with courses of action that humans will endorse, and that humans will like all kinds of things that they would not have endorsed ahead of time… for that matter, humans like all kinds of things that they simultaneously don’t endorse.
And no, not really interested in private discussion of alternate FAI approaches, though if you made a post about it I’d probably read it.
Generally we aim to come up with things humans will both like and endorse. Optimizing for “like” but not “endorse” leads to various forms of drugging or wireheading (even if Eliezer does disturb me by being tempted towards such things). Optimizing for “endorse” but not “like” sounds like carrying the dystopia we currently call “real life” to its logical, horrid conclusion.
How well-founded does a set of notes or thoughts have to be in order to be worth posting here?
(shrug) Well, OK. If I consider the set of plans A which maximize our values when implemented, and the set of plans B which we endorse when they’re explained to us, I’m prepared to believe that the AB intersection is nonempty. And really, any technique that stands a chance worth considering of coming up with anything in A is sufficiently outside my experience that I won’t express an opinion about whether it’s noticably less likely to come up with something in AB. So, go for it, I guess.
Depends on whom you ask. I’d say it’s the product of (novel relevant concise entertaining coherent) that gets compared to threshold; well-founded is a nice benny but not critical. That said, posts that don’t make the threshold will frequently be berated for being ill-founded if they are.