Thus the simple trainable AI with a particular focus—write automated news stories—will be trained, through feedback, to learn about its editors/controllers, to distinguish them, to get to know them, and, in effect, to manipulate them.
Detecting and adapting to the individual controllers doesn’t seem to me particularly bad.
Emotionally manipulating the controllers using the content of the stories would be more worrying, but note that this is essentially only possible if the AI is allowed to plan more than one story at time. If the AI can do that, then it can trade off the reward obtained by the story at time t for greater rewards at times >t. Otherwise, any trade off will be limited to the different parts of each story, which greatly reduces the opportunities for significant emotional manipulation of the controllers. I see no reason this story-writing AI would need to be allowed to plan more than one story at time.
I think this is an example of a general issue in safe AI design that you and other FAI folks overlook: dynamic inconsistency can provide intrinsic protection from unwanted long-term strategies from the AI.
You seem to always implicitly assume that the AI will be an agent trying to maximize a (discounted) utility or reward over a long, ideally infinite, time horizon, that is, you assume that the AI will be approximately dynamically consistent. This may be a reasonable requirement for an autonomous agent that needs to operate for extended times without direct human supervision, but not for a tool AI. The work of a tool AI can be naturally broken into self-contained tasks, and if the AI doesn’t maximize utility or reward over multiple tasks, then any treacherous plan to gain utility in ways we would disapprove of will have to be confined to a single task. This is not a 100% safety guarantee, but certainly it makes the AI safety problem much more manageable.
Because the AI is programmed by people who hadn’t thought of this issue, and the other way turned out to be simpler/easier?
Ok, but if this is a narrow AI rather than an AGI agent used for that particular activity, then it seems intuitive to me that designing it to plan over a single task at time would be simpler.
The post you liked doesn’t deal with dynamic inconsistency. It refers to agents that are expected utility maximizers under Von Neumann–Morgenstern utility theory, but this theory only deals with one-shot decision making, not decision making over time.
You can reduce the problem of decision making over time to one-shot decision making by combining instantaneous utilities into a cumulative utility function ( * ) and then using it as a one-shot utility function.
If you combine the instantaneous utilities by their (exponentially discounted) sum over an infinite time horizon, you obtain a dynamically consistent expected utility maximizer agent. But if you sum utilities up to a fixed time horizon, you still obtain an agent that at each instant is an expected utility maximizer, but it is not dynamically consistent.
You may argue that dynamical inconsistency is not stable under evolution by random mutations and natural selection, but it is not obvious to me that AIs would face such scenario. Even an AI that modifies itself or generate successors has no incentive to maximize its evolutionary fitness unless you specifically program it to do so.
Detecting and adapting to the individual controllers doesn’t seem to me particularly bad.
Emotionally manipulating the controllers using the content of the stories would be more worrying, but note that this is essentially only possible if the AI is allowed to plan more than one story at time. If the AI can do that, then it can trade off the reward obtained by the story at time t for greater rewards at times >t. Otherwise, any trade off will be limited to the different parts of each story, which greatly reduces the opportunities for significant emotional manipulation of the controllers.
I see no reason this story-writing AI would need to be allowed to plan more than one story at time.
I think this is an example of a general issue in safe AI design that you and other FAI folks overlook: dynamic inconsistency can provide intrinsic protection from unwanted long-term strategies from the AI.
You seem to always implicitly assume that the AI will be an agent trying to maximize a (discounted) utility or reward over a long, ideally infinite, time horizon, that is, you assume that the AI will be approximately dynamically consistent. This may be a reasonable requirement for an autonomous agent that needs to operate for extended times without direct human supervision, but not for a tool AI.
The work of a tool AI can be naturally broken into self-contained tasks, and if the AI doesn’t maximize utility or reward over multiple tasks, then any treacherous plan to gain utility in ways we would disapprove of will have to be confined to a single task. This is not a 100% safety guarantee, but certainly it makes the AI safety problem much more manageable.
Because the AI is programmed by people who hadn’t thought of this issue, and the other way turned out to be simpler/easier?
I know. The problem is that inconsistency is unstable (which is why we’re using other measures to maintain it, eg using a tool AI only). That’s one of the reasons I was interested in stable versions of these kind of unstable motivations http://lesswrong.com/r/discussion/lw/lws/closest_stable_alternative_preferences/ .
Ok, but if this is a narrow AI rather than an AGI agent used for that particular activity, then it seems intuitive to me that designing it to plan over a single task at time would be simpler.
The post you liked doesn’t deal with dynamic inconsistency. It refers to agents that are expected utility maximizers under Von Neumann–Morgenstern utility theory, but this theory only deals with one-shot decision making, not decision making over time.
You can reduce the problem of decision making over time to one-shot decision making by combining instantaneous utilities into a cumulative utility function ( * ) and then using it as a one-shot utility function.
If you combine the instantaneous utilities by their (exponentially discounted) sum over an infinite time horizon, you obtain a dynamically consistent expected utility maximizer agent. But if you sum utilities up to a fixed time horizon, you still obtain an agent that at each instant is an expected utility maximizer, but it is not dynamically consistent.
You may argue that dynamical inconsistency is not stable under evolution by random mutations and natural selection, but it is not obvious to me that AIs would face such scenario. Even an AI that modifies itself or generate successors has no incentive to maximize its evolutionary fitness unless you specifically program it to do so.
Actually, you could use corrigibility to get dynamic inconsistency https://intelligence.org/2014/10/18/new-report-corrigibility/ .