OK sure. But treacherous turns and acting according to what humans think is good (as opposed to what is actually good) are, like, the two big classic alignment problems. Not being capable enough to figure out what we want is… not even an alignment problem in my book, but I can understand why people would call it one.
I think the distinction here is that obviously any ASI could figure out what humans want, but it’s generally been assumed that that would only happen after its initial goal (Eg paperclips) was already baked. If we can define the goal better before creating the EUM, we’re in slightly better shape.
Treacherous turns are obviously still a problem, but they only happen towards a certain end, right? And a world where an AI does what humans at one point thought was good, as opposed to what was actually good, does seem slightly more promising than a world completely independent from what humans think is good.
That said, the “shallowness” of any such description of goodness (e.g. only needing to fool camera sensors etc) is still the primary barrier to gaming the objective.
You don’t think there could be powerful systems that take what we say too literally and thereby cause massive issues[1]. Isn’t it better if power comes along with human understanding? I admit some people desire the opposite, for powerful machines to be unable to model humans so that it can’t manipulate us, but such machines will either a) be merely imitating behaviour and thereby struggle to adapt to new situations or b) most likely not do what we want when we try to use them.
OK sure. But treacherous turns and acting according to what humans think is good (as opposed to what is actually good) are, like, the two big classic alignment problems. Not being capable enough to figure out what we want is… not even an alignment problem in my book, but I can understand why people would call it one.
I think the distinction here is that obviously any ASI could figure out what humans want, but it’s generally been assumed that that would only happen after its initial goal (Eg paperclips) was already baked. If we can define the goal better before creating the EUM, we’re in slightly better shape.
Treacherous turns are obviously still a problem, but they only happen towards a certain end, right? And a world where an AI does what humans at one point thought was good, as opposed to what was actually good, does seem slightly more promising than a world completely independent from what humans think is good.
That said, the “shallowness” of any such description of goodness (e.g. only needing to fool camera sensors etc) is still the primary barrier to gaming the objective.
EUM? Thanks for helping explain.
Expected Utility Maximiser.
OK, fair enough.
You don’t think there could be powerful systems that take what we say too literally and thereby cause massive issues[1]. Isn’t it better if power comes along with human understanding? I admit some people desire the opposite, for powerful machines to be unable to model humans so that it can’t manipulate us, but such machines will either a) be merely imitating behaviour and thereby struggle to adapt to new situations or b) most likely not do what we want when we try to use them.
As an example, high-functioning autism exists.
Sure, there could be such systems. But I’m more worried about the classic alignment problems.