I see—so you’re describing a purely input-based and momentary utility function, which can rely only the time-independent response from the environment. For the incomplete-information circumstances that I’m modeling, agents representable in this way would need to be ridiculously stupid, as they couldn’t make any connections between their actions and the feedback they get, nor between various instances of feedback. For example, a paperclip maximizer of this form could only check whether a paperclip is currently in its sensory access, in the best case.
Do you see how, if we expand the utility function’s scope to both the agent’s actions and its full history, a “behavior-adherence” utility function becomes trivial?
By “utility function” here, I just mean a function encoding the preferences of an agent—one that it optimizes—based on everything available to it. So, for any behavioral model, you could construct such a function that universally prefers the agent’s actions to be linked to its information by that model.
It sounds like this may not be what you associate this word with. Could you give me an example of a behavior pattern that is not optimized by any utility function?