AGIs will have a causal model of the world. If their own output is part of that model, and they work forward from there to the real-world consequences of their outputs, and they choose outputs partly based on those consequences, then it’s an agent by (my) definition. The outputs are called “actions” and the consequences are called “goals”. In all other cases, then I’d call it a service, unless I’m forgetting about some edge cases.
A system whose only output is text on a screen can be either a service or an agent, depending on the computational process generating the text. A simple test is that if there’s a weird, non-obvious way to manipulate the people reading the text (according the everyday, bad-connotation sense of “manipulate”), would the system take advantage of it? Agents would do so (by default, unless they had a complicated goal involving ethics etc.), services would not by default.
Nobody knows how to build a useful AI capable of world-modeling and formulating intelligent plans but which is not an agent, although I’m personally hopeful that it might be possible by self-supervised learning (cf. Self-Supervised Learning and AGI safety).
This sounds like we’re resting on an abstract generalization of ‘outputs.’ Is there any work being done to distinguish between different outputs, and consider how a computer might recognize a kind it doesn’t already have?
Right, I was using “output” in a broad sense of “any way that the system can causally impact the rest of the world”. We can divide that into “intended output channels” (text on a screen etc.) and “unintended output channels” (sending out radio signals using RAM etc.). I’m familiar with a small amount of work on avoiding unintended output channels (e.g. using homomorphic encryption or fancy vacuum-sealed Faraday cage boxes).
Usually the assumption is that a superintelligent AI will figure out what it is, and where it is, and how it works, and what all its output channels are (both intended and unintended), unless there is some strong reason to believe otherwise (example). I’m not sure this answers your question … I’m a bit confused at what you’re getting at.
I am aiming directly at questions of how an AI that starts with a only a robotic arm might get to controlling drones or trading stocks, from the perspective of the AI. My intuition, driven by Moravec’s Paradox, is that each new kind of output (or input) has a pretty hefty computational threshold associated with it, so I suspect that the details of the initial inputs/outputs will have a big influence on the risk any given service or agent presents.
The reason I am interested in this is that it feels like doing things has no intrinsic connection to learning things, and the we only link them because so much of our learning and doing is unconscious. That is to say, I suspect actions are orthogonal to intelligence.
Regarding “computational threshold”, my working assumption is that any given capability X is either (1) always and forever out of reach of a system by design, or (2) completely useless, or (3) very likely to be learned by a system, if the system has long-term real-world goals. Maybe it takes some computational time and effort to learn it, but AIs are not lazy (unless we program them to be). AIs are just systems that make good decisions in pursuit of a goal, and if “acquiring capability X” is instrumentally helpful towards achieving goals in the world, it will probably make that decision if it can (cf. “Instrumental convergence”).
If I have a life goal that is best accomplished by learning to use a forklift, I’ll learn to use a forklift, right? Maybe I won’t be very fluid at it, but fine, I’ll operate it more slowly and deliberately, or design a forklift autopilot subsystem, or whatever...
AGIs will have a causal model of the world. If their own output is part of that model, and they work forward from there to the real-world consequences of their outputs, and they choose outputs partly based on those consequences, then it’s an agent by (my) definition. The outputs are called “actions” and the consequences are called “goals”. In all other cases, then I’d call it a service, unless I’m forgetting about some edge cases.
A system whose only output is text on a screen can be either a service or an agent, depending on the computational process generating the text. A simple test is that if there’s a weird, non-obvious way to manipulate the people reading the text (according the everyday, bad-connotation sense of “manipulate”), would the system take advantage of it? Agents would do so (by default, unless they had a complicated goal involving ethics etc.), services would not by default.
Nobody knows how to build a useful AI capable of world-modeling and formulating intelligent plans but which is not an agent, although I’m personally hopeful that it might be possible by self-supervised learning (cf. Self-Supervised Learning and AGI safety).
This sounds like we’re resting on an abstract generalization of ‘outputs.’ Is there any work being done to distinguish between different outputs, and consider how a computer might recognize a kind it doesn’t already have?
Right, I was using “output” in a broad sense of “any way that the system can causally impact the rest of the world”. We can divide that into “intended output channels” (text on a screen etc.) and “unintended output channels” (sending out radio signals using RAM etc.). I’m familiar with a small amount of work on avoiding unintended output channels (e.g. using homomorphic encryption or fancy vacuum-sealed Faraday cage boxes).
Usually the assumption is that a superintelligent AI will figure out what it is, and where it is, and how it works, and what all its output channels are (both intended and unintended), unless there is some strong reason to believe otherwise (example). I’m not sure this answers your question … I’m a bit confused at what you’re getting at.
I am aiming directly at questions of how an AI that starts with a only a robotic arm might get to controlling drones or trading stocks, from the perspective of the AI. My intuition, driven by Moravec’s Paradox, is that each new kind of output (or input) has a pretty hefty computational threshold associated with it, so I suspect that the details of the initial inputs/outputs will have a big influence on the risk any given service or agent presents.
The reason I am interested in this is that it feels like doing things has no intrinsic connection to learning things, and the we only link them because so much of our learning and doing is unconscious. That is to say, I suspect actions are orthogonal to intelligence.
Regarding “computational threshold”, my working assumption is that any given capability X is either (1) always and forever out of reach of a system by design, or (2) completely useless, or (3) very likely to be learned by a system, if the system has long-term real-world goals. Maybe it takes some computational time and effort to learn it, but AIs are not lazy (unless we program them to be). AIs are just systems that make good decisions in pursuit of a goal, and if “acquiring capability X” is instrumentally helpful towards achieving goals in the world, it will probably make that decision if it can (cf. “Instrumental convergence”).
If I have a life goal that is best accomplished by learning to use a forklift, I’ll learn to use a forklift, right? Maybe I won’t be very fluid at it, but fine, I’ll operate it more slowly and deliberately, or design a forklift autopilot subsystem, or whatever...