That’s a good observation. These situations are analogous in some ways:
An AI raised inside a simulation may come to realize that its universe actually exists as a pattern of data on a processor located in a far different universe.
An AI raised in a constrained environment and fed selective information may come to realize that a lot of its assumptions about the basic functioning of the world-at-large are simplistic. The degree to which the integration of knowledge about the real world would be “irreconcilable” hinges on the details of this scenario.
An AI raised in the wild may realize that the accepted understanding of “physics” is actually not correct, and thus lose a lot of what anchored it to certain interpretations of reality, such as what “humans” are.
I wonder if it could be possible to permanently anchor an agent to its original ontology. To specify that the ontology with which it initialized is the perspective that it is required to use when evaluating its utility function. The agent is permitted the build whatever models it needs to build, but it’s only allowed to assign value using the primitive concepts. So:
An AI raised in a simulated environment comes to understand that it lives in a simulation, but is hard-coded to evaluate decisions by “reasoning-as-if” the simulated environment is the level of interpretation on which value resides.
An AI raised in a constrained environment sees outside the constraints, but is only permitted to evaluate its decisions based on their impact on the simplified concepts it started out with.
An AI raised in the wild sees that physics is wrong but doesn’t lose its connection with the objects of value that were defined within the prior physical paradigm.
(Or perhaps the agent is allowed to re-define its value system within the new, more accurate ontology, but it isn’t allowed to do so until it comes up with a sufficiently good mapping that the prior ontology and the new ontology give the same answers on questions of value. And if it can never accomplish that, then it simply never uses the new mapping.)
On the one hand, we do ultimately want agents who can grow to understand everything. And we don’t want them to stop caring about humans the moment they stop seeing “humans” and start seeing “quivering blobs of cellular machinery”.
Another thought is that AIs won’t necessarily be as preoccupied with what is “real” as humans sometimes are. Just because an agent realizes that its whole world model is “not sufficiently fundamental” doesn’t immediately imply that it discards the prior model wholesale.
I wonder if it could be possible to permanently anchor an agent to its original ontology. To specify that the ontology with which it initialized is the perspective that it is required to use when evaluating its utility function. The agent is permitted the build whatever models it needs to build, but it’s only allowed to assign value using the primitive concepts.
That actually seems like what humans do. Human confusions about moral philosophy even seem quite like an ontological crisis in an AI.
That’s a good observation. These situations are analogous in some ways:
An AI raised inside a simulation may come to realize that its universe actually exists as a pattern of data on a processor located in a far different universe.
An AI raised in a constrained environment and fed selective information may come to realize that a lot of its assumptions about the basic functioning of the world-at-large are simplistic. The degree to which the integration of knowledge about the real world would be “irreconcilable” hinges on the details of this scenario.
An AI raised in the wild may realize that the accepted understanding of “physics” is actually not correct, and thus lose a lot of what anchored it to certain interpretations of reality, such as what “humans” are.
I wonder if it could be possible to permanently anchor an agent to its original ontology. To specify that the ontology with which it initialized is the perspective that it is required to use when evaluating its utility function. The agent is permitted the build whatever models it needs to build, but it’s only allowed to assign value using the primitive concepts. So:
An AI raised in a simulated environment comes to understand that it lives in a simulation, but is hard-coded to evaluate decisions by “reasoning-as-if” the simulated environment is the level of interpretation on which value resides.
An AI raised in a constrained environment sees outside the constraints, but is only permitted to evaluate its decisions based on their impact on the simplified concepts it started out with.
An AI raised in the wild sees that physics is wrong but doesn’t lose its connection with the objects of value that were defined within the prior physical paradigm.
(Or perhaps the agent is allowed to re-define its value system within the new, more accurate ontology, but it isn’t allowed to do so until it comes up with a sufficiently good mapping that the prior ontology and the new ontology give the same answers on questions of value. And if it can never accomplish that, then it simply never uses the new mapping.)
On the one hand, we do ultimately want agents who can grow to understand everything. And we don’t want them to stop caring about humans the moment they stop seeing “humans” and start seeing “quivering blobs of cellular machinery”.
Another thought is that AIs won’t necessarily be as preoccupied with what is “real” as humans sometimes are. Just because an agent realizes that its whole world model is “not sufficiently fundamental” doesn’t immediately imply that it discards the prior model wholesale.
That actually seems like what humans do. Human confusions about moral philosophy even seem quite like an ontological crisis in an AI.