neuroscience has traditionally not been concerned with goal pursuit per se but rather with the cognitive component or sub-components that contribute to it. [...] whereas social psychology has tended to study more abstract life goals.
Apparently many AI safety problems revolve around the wrong goals or the extreme satisfaction of goals.
The usually implied or explicit definition of a goal seems to be the minimum difference to a target state (which might be infinity for some valuation functions).
Many AI models include some notion of the goal in some coded or explicitly given form. In general that coding isn’t the ‘real’ goal. By real goal I mean that which the AI system it total appears to optimize for as a whole. And that may differ from the specification due to the structure of the available input and output channels and the strength of the optimization process. Nonetheless there is some goal and there is a conceptual relation between the coded and the real goal.
But maybe real things can be a bit more complicated. Consider human goal formation. Apparently we do have goals. And we kind of optimize for them. But the question arises: Where do they come from cognitively and neurologically?
Goals are very high level concepts. I think there is no high level specification of the goals somewhere inside us that we read off and optimize for. I think our goals are our own understanding—on that high level of abstraction—of those patterns behind our behavior.
If that is right and goals are just our own understanding of some patterns of behavior, then how comes there is are specific brain modules (prefrontal cortex) devoted to planning for it? Or rather how come these brain parts are actually connected to the abstract concept of a goal? Or aren’t they? And the planning doesn’t act on our understand of the goals but on the constituent parts. What are these?
In my children I see clearly goal-directed behavior long before they can articulate the concept. And there are clear intermediate steps where they desperately try to optimize for very isolated goals. For example winning a race to the door. Trying to climb a fence. Being the first one to get a treat. Winning a game. Loosing apparently causes real suffering. But why? Where is the loss? How are any of these things even matched against a loss. How does they brain match whatever representation of reality to these emotions? How do the encodings of concepts for me and you and our race get connected to our feelings about this situation? And I kind of assume here that the emotions themselves somehow produce the valuation that controls our motivation.
I took issue with not knowing how humans formed goals. so I made this list of common human goals and suggested humans who do not know should look at the list of common goals and pick ones that are relevant to themselves.
You seem to be confusing goals and value systems—even without a goal, the UFAI risk is not gone.
Maybe it is not right to anthropomorphize but take a human who is (acting) absolutely clueless, and given choices. They’ll pick something and stick to it. Questioned about it, they’ll say something like “I dunno, I think I like that option” . This is what I’d imagine something without a goal to act—maybe it is consistent, maybe it will pick things it likes, but it doesn’t plan ahead and doesn’t try to steer actions to a goal.
For an AI, that would be a totally indifferent AI. I think it would just sit idle or do random actions. If you then give it a bad value system, and ask it to help you, you’ll get “no” back. Helping people takes effort. Who’d want to spend processor cycles on that?
...
On the other hand, perhaps goals and value systems are actually the same; having a value system means you’ll have goals (“envisioned preferred world states” vs “preferred world states”), so you can not not have goals whilst having a value system. In that case, you’d have an AI without values. This I think is likely to result in one of two options… on contact with a human that provides an order to follow, it could either not care and do nothing (it stays idle… forever, not even acting in self-preservation because, again, it has no values). Or, it accepts the order and just goes along. That’d be dangerous, because this has basically no brakes—if it does whatever you ask of it, without regard for human values… I hope you didn’t ask for anything complex. “World peace” would resolve very nastily, as would “get me some money” (it is stolen from your neighbors… or maybe it brings you your wallet), and things like “get me a glass of water” can be interpreted in so many ways that being handed a piece of ice in the shape of a drinking glass is in the positive side of results.
That’s the crux of it, I think. Without a value system, there are no brakes. There might also not be any way to get the AI to do anything. But with a value system that is flawed, there might be no brakes in a scenario where we’d want the AI to stop. Or the AI wouldn’t entertain requests that we’d want it to do. So a lot of research goes into this area to make sure we can make the AI do what we want it to do in a way that we’re okay with.
(epistemic status: Ruminations on cognitive processes by non-expert.)
I have a question tangential to AI safety about goal formation. How do goals form in systems that do no explicitly have goals to begin with?
I tried to google this and didn’t find answers neither for AI systems nor for neuropsychology. One source (Rehabilitation Goal Setting: Theory, Practice and Evidence) summarised:
Apparently many AI safety problems revolve around the wrong goals or the extreme satisfaction of goals. The usually implied or explicit definition of a goal seems to be the minimum difference to a target state (which might be infinity for some valuation functions). Many AI models include some notion of the goal in some coded or explicitly given form. In general that coding isn’t the ‘real’ goal. By real goal I mean that which the AI system it total appears to optimize for as a whole. And that may differ from the specification due to the structure of the available input and output channels and the strength of the optimization process. Nonetheless there is some goal and there is a conceptual relation between the coded and the real goal.
But maybe real things can be a bit more complicated. Consider human goal formation. Apparently we do have goals. And we kind of optimize for them. But the question arises: Where do they come from cognitively and neurologically?
Goals are very high level concepts. I think there is no high level specification of the goals somewhere inside us that we read off and optimize for. I think our goals are our own understanding—on that high level of abstraction—of those patterns behind our behavior.
If that is right and goals are just our own understanding of some patterns of behavior, then how comes there is are specific brain modules (prefrontal cortex) devoted to planning for it? Or rather how come these brain parts are actually connected to the abstract concept of a goal? Or aren’t they? And the planning doesn’t act on our understand of the goals but on the constituent parts. What are these?
In my children I see clearly goal-directed behavior long before they can articulate the concept. And there are clear intermediate steps where they desperately try to optimize for very isolated goals. For example winning a race to the door. Trying to climb a fence. Being the first one to get a treat. Winning a game. Loosing apparently causes real suffering. But why? Where is the loss? How are any of these things even matched against a loss. How does they brain match whatever representation of reality to these emotions? How do the encodings of concepts for me and you and our race get connected to our feelings about this situation? And I kind of assume here that the emotions themselves somehow produce the valuation that controls our motivation.
I took issue with not knowing how humans formed goals. so I made this list of common human goals and suggested humans who do not know should look at the list of common goals and pick ones that are relevant to themselves.
You seem to be confusing goals and value systems—even without a goal, the UFAI risk is not gone.
Maybe it is not right to anthropomorphize but take a human who is (acting) absolutely clueless, and given choices. They’ll pick something and stick to it. Questioned about it, they’ll say something like “I dunno, I think I like that option” . This is what I’d imagine something without a goal to act—maybe it is consistent, maybe it will pick things it likes, but it doesn’t plan ahead and doesn’t try to steer actions to a goal.
For an AI, that would be a totally indifferent AI. I think it would just sit idle or do random actions. If you then give it a bad value system, and ask it to help you, you’ll get “no” back. Helping people takes effort. Who’d want to spend processor cycles on that?
...
On the other hand, perhaps goals and value systems are actually the same; having a value system means you’ll have goals (“envisioned preferred world states” vs “preferred world states”), so you can not not have goals whilst having a value system. In that case, you’d have an AI without values. This I think is likely to result in one of two options… on contact with a human that provides an order to follow, it could either not care and do nothing (it stays idle… forever, not even acting in self-preservation because, again, it has no values). Or, it accepts the order and just goes along. That’d be dangerous, because this has basically no brakes—if it does whatever you ask of it, without regard for human values… I hope you didn’t ask for anything complex. “World peace” would resolve very nastily, as would “get me some money” (it is stolen from your neighbors… or maybe it brings you your wallet), and things like “get me a glass of water” can be interpreted in so many ways that being handed a piece of ice in the shape of a drinking glass is in the positive side of results.
That’s the crux of it, I think. Without a value system, there are no brakes. There might also not be any way to get the AI to do anything. But with a value system that is flawed, there might be no brakes in a scenario where we’d want the AI to stop. Or the AI wouldn’t entertain requests that we’d want it to do. So a lot of research goes into this area to make sure we can make the AI do what we want it to do in a way that we’re okay with.