The first AGI will behave as if it is maximizing some utility function
There’s something that precedes that, which is the assumption that an “as if” or “stance” UF can still have some relation to AI safety—that is to say, a UF which is not actually a component from an AI, or disentangleable from an AI. It’s easy to see that a UF that is actually a module can be used to simplify the problem of AI safety. It’s also possible how to see how an “as if” UF could be used to theoretically predict AI behaviour , provided there is some guarantee that it is stable—but
a version of the problem of induction means that stability is difficult to establish.
There’s something that precedes that, which is the assumption that an “as if” or “stance” UF can still have some relation to AI safety—that is to say, a UF which is not actually a component from an AI, or disentangleable from an AI. It’s easy to see that a UF that is actually a module can be used to simplify the problem of AI safety. It’s also possible how to see how an “as if” UF could be used to theoretically predict AI behaviour , provided there is some guarantee that it is stable—but
a version of the problem of induction means that stability is difficult to establish.