Yeah, it’s bad news in terms of timelines, but good news in terms of an AI being able to implicitly figure out what we want it to do. Obviously, it doesn’t address issues like treacherous turns or acting according to what humans think is good as opposed to what is actually good; and I’m not claiming that this is necessarily net-positive, but there’s a silver lining here.
OK sure. But treacherous turns and acting according to what humans think is good (as opposed to what is actually good) are, like, the two big classic alignment problems. Not being capable enough to figure out what we want is… not even an alignment problem in my book, but I can understand why people would call it one.
I think the distinction here is that obviously any ASI could figure out what humans want, but it’s generally been assumed that that would only happen after its initial goal (Eg paperclips) was already baked. If we can define the goal better before creating the EUM, we’re in slightly better shape.
Treacherous turns are obviously still a problem, but they only happen towards a certain end, right? And a world where an AI does what humans at one point thought was good, as opposed to what was actually good, does seem slightly more promising than a world completely independent from what humans think is good.
That said, the “shallowness” of any such description of goodness (e.g. only needing to fool camera sensors etc) is still the primary barrier to gaming the objective.
You don’t think there could be powerful systems that take what we say too literally and thereby cause massive issues[1]. Isn’t it better if power comes along with human understanding? I admit some people desire the opposite, for powerful machines to be unable to model humans so that it can’t manipulate us, but such machines will either a) be merely imitating behaviour and thereby struggle to adapt to new situations or b) most likely not do what we want when we try to use them.
Yeah, it’s bad news in terms of timelines, but good news in terms of an AI being able to implicitly figure out what we want it to do.
Alignment:
1) Figure out what we want.
2) Do that.
People who are worried about 2/two, may still be worried. I’d agree with you on 1/one, it does seem that way. (I initially thought of it as understanding things/language better—the human nature of jokes is easily taken for granted.)
Yeah, it’s bad news in terms of timelines, but good news in terms of an AI being able to implicitly figure out what we want it to do. Obviously, it doesn’t address issues like treacherous turns or acting according to what humans think is good as opposed to what is actually good; and I’m not claiming that this is necessarily net-positive, but there’s a silver lining here.
OK sure. But treacherous turns and acting according to what humans think is good (as opposed to what is actually good) are, like, the two big classic alignment problems. Not being capable enough to figure out what we want is… not even an alignment problem in my book, but I can understand why people would call it one.
I think the distinction here is that obviously any ASI could figure out what humans want, but it’s generally been assumed that that would only happen after its initial goal (Eg paperclips) was already baked. If we can define the goal better before creating the EUM, we’re in slightly better shape.
Treacherous turns are obviously still a problem, but they only happen towards a certain end, right? And a world where an AI does what humans at one point thought was good, as opposed to what was actually good, does seem slightly more promising than a world completely independent from what humans think is good.
That said, the “shallowness” of any such description of goodness (e.g. only needing to fool camera sensors etc) is still the primary barrier to gaming the objective.
EUM? Thanks for helping explain.
Expected Utility Maximiser.
OK, fair enough.
You don’t think there could be powerful systems that take what we say too literally and thereby cause massive issues[1]. Isn’t it better if power comes along with human understanding? I admit some people desire the opposite, for powerful machines to be unable to model humans so that it can’t manipulate us, but such machines will either a) be merely imitating behaviour and thereby struggle to adapt to new situations or b) most likely not do what we want when we try to use them.
As an example, high-functioning autism exists.
Sure, there could be such systems. But I’m more worried about the classic alignment problems.
Alignment:
1) Figure out what we want.
2) Do that.
People who are worried about 2/two, may still be worried. I’d agree with you on 1/one, it does seem that way. (I initially thought of it as understanding things/language better—the human nature of jokes is easily taken for granted.)