I wonder if it could be possible to permanently anchor an agent to its original ontology. To specify that the ontology with which it initialized is the perspective that it is required to use when evaluating its utility function. The agent is permitted the build whatever models it needs to build, but it’s only allowed to assign value using the primitive concepts.
That actually seems like what humans do. Human confusions about moral philosophy even seem quite like an ontological crisis in an AI.
That actually seems like what humans do. Human confusions about moral philosophy even seem quite like an ontological crisis in an AI.