However, this risk is significantly different. If you believed that superintelligent AI must be goal-directed because of math, then your only recourse for safety would be to make sure that the goal is good, which is what motivated us to study ambitious value learning. But if the argument is actually that AI will be goal-directed because humans will make it that way, you could try to build AI that is not goal-directed that can do the things that goal-directed AI can do, and have humans build that instead.
I’m curious about the extent to which people have felt like “superintelligent AI must be goal-directed” has been the primary problem? Now that I see it expressed in this form, I realize that there have for a long time been lots of papers and comments which seem to suggest that this might be people’s primary concern. But I always kind of looked at it from the perspective of “yeah this is one concern, but even assuming that we could make a non-goal-directed AI, that doesn’t solve the problem of other people having an incentive to make goal-directed-AI (and that’s the much more pressing problem)”. So since we seemed to agree on goal-directed superintelligence being a big problem, maybe I overestimated the extent of my agreement with other people concerned about goal-directed superintelligence.
I also am unsure about how much people think that’s the primary problem. I feel fairly confident that Eliezer thinks (or thought at some recent point) that this was the primary problem. I came into the field thinking of this as the primary problem.
It certainly seems that many people assume that a superintelligent AI system has a utility function. I don’t know their reasons for this assumption.
The standard rebuttal here is that even if a superintelligent AI system is not goal directed, we should be concerned that the AI will spontaneously develop goal directed behavior because it is instrumentally valuable to doing whatever it is doing (and is not “doing whatever it is doing” a “goal”, even if the AI does not conceive of it as a goal, the same way as the calculator has a “goal” or purpose, even if the calculator is unaware of it). This is of course contingent on it being “superintelligent”.
For what it’s worth this is also the origin, as I recall it, of concerns about paperclip maximizers: you won’t build an AI that sets out to tile the universe with paperclips, but through a series of unfortunate misunderstandings it will, as a subagent or an instrumental action, end up optimizing for paperclips anyway because it seemed like a good idea at the time.
It sounds to me like you’re requiring “superintelligent” to include “has a goal” as part of the definition. If that’s part of the definition, then I would rephrase my point as “why do we have to build something superintelligent? Let’s instead build something that doesn’t have a goal but is still useful, like an AI system that follows norms.”
See also this comment, which answers a related question.
I’m not seeing the “can’t control”. Sure , agent AI is more powerful than tool AI—and more powerful things need more control to make them do what you want.
The majority of people choose to make non-goal-directed uncontrolled natural-intelligence agents. It seems likely that as general AI becomes feasible, this drive to procreate will motivate at least some to create such a thing.
It doesn’t seem likely to me. People don’t procreate in order to fulfil the abstract definition you gave, they procreate to fulfil biological urges and cultural mores.
You have more faith in your model of people’s motivation than I do in mine. But that doesn’t mean you’re right. There are tons of examples in literature and in daily life of mis-/re-directed biological drives, and making an AGI “child” seems so mundane a motive that I hadn’t considered until your comment that it might NOT be strong enough motive.
I have to admit I’ve seen this as a strong motive for creating AGI in both myself and others. Maybe it’s because I just don’t get along with other humans very well (or specifically I fail to model them properly), or because I feel as if I would understand AGI better than them, but it just seems much more appealing to me than having an actual child, at least right now.
Specifically, my goal is (assuming I understand correctly) non-goal-directed bounded artificial intelligence agents, so… it’s pretty similar, at least. It’s certainly a strong enough motive for some people.
I’m curious about the extent to which people have felt like “superintelligent AI must be goal-directed” has been the primary problem? Now that I see it expressed in this form, I realize that there have for a long time been lots of papers and comments which seem to suggest that this might be people’s primary concern. But I always kind of looked at it from the perspective of “yeah this is one concern, but even assuming that we could make a non-goal-directed AI, that doesn’t solve the problem of other people having an incentive to make goal-directed-AI (and that’s the much more pressing problem)”. So since we seemed to agree on goal-directed superintelligence being a big problem, maybe I overestimated the extent of my agreement with other people concerned about goal-directed superintelligence.
I also am unsure about how much people think that’s the primary problem. I feel fairly confident that Eliezer thinks (or thought at some recent point) that this was the primary problem. I came into the field thinking of this as the primary problem.
It certainly seems that many people assume that a superintelligent AI system has a utility function. I don’t know their reasons for this assumption.
The standard rebuttal here is that even if a superintelligent AI system is not goal directed, we should be concerned that the AI will spontaneously develop goal directed behavior because it is instrumentally valuable to doing whatever it is doing (and is not “doing whatever it is doing” a “goal”, even if the AI does not conceive of it as a goal, the same way as the calculator has a “goal” or purpose, even if the calculator is unaware of it). This is of course contingent on it being “superintelligent”.
For what it’s worth this is also the origin, as I recall it, of concerns about paperclip maximizers: you won’t build an AI that sets out to tile the universe with paperclips, but through a series of unfortunate misunderstandings it will, as a subagent or an instrumental action, end up optimizing for paperclips anyway because it seemed like a good idea at the time.
It sounds to me like you’re requiring “superintelligent” to include “has a goal” as part of the definition. If that’s part of the definition, then I would rephrase my point as “why do we have to build something superintelligent? Let’s instead build something that doesn’t have a goal but is still useful, like an AI system that follows norms.”
See also this comment, which answers a related question.
Does anyone have an incentive to make a non-goal directed AI they can’t control?
(did you mean to ask goal-directed?)
Related: Gwern wrote a post arguing that people have an incentive to build a goal-directed AI over a non-goal directed AI. See the references here.
I’m not seeing the “can’t control”. Sure , agent AI is more powerful than tool AI—and more powerful things need more control to make them do what you want.
The majority of people choose to make non-goal-directed uncontrolled natural-intelligence agents. It seems likely that as general AI becomes feasible, this drive to procreate will motivate at least some to create such a thing.
It doesn’t seem likely to me. People don’t procreate in order to fulfil the abstract definition you gave, they procreate to fulfil biological urges and cultural mores.
You have more faith in your model of people’s motivation than I do in mine. But that doesn’t mean you’re right. There are tons of examples in literature and in daily life of mis-/re-directed biological drives, and making an AGI “child” seems so mundane a motive that I hadn’t considered until your comment that it might NOT be strong enough motive.
I have to admit I’ve seen this as a strong motive for creating AGI in both myself and others. Maybe it’s because I just don’t get along with other humans very well (or specifically I fail to model them properly), or because I feel as if I would understand AGI better than them, but it just seems much more appealing to me than having an actual child, at least right now. Specifically, my goal is (assuming I understand correctly) non-goal-directed bounded artificial intelligence agents, so… it’s pretty similar, at least. It’s certainly a strong enough motive for some people.