Any form of generalization can be represented by a function on behavior which produces its results and yields actions based on them—I’m not following you here. Can you give me an example of a model of behavior that isn’t purely observational, in the sense that it can’t be represented as a function of the full history of actions and responses? Any model with such a representation is susceptible to a utility function that just checks whether each past action adhered to said function.
A purely observational model of behaviour is simply a list of actions that have actually been observed, and the histories of the universe that led to them. For example, with my trivial agent you could observe:
“With the history of the universe being just the list of states [A], it performed action b leading to state B. With the list being [AB] it performed action c leading to state C. With the list being [ABC] it performed action a leading to state A.”
From this model you can conclude that if the universe was somehow rewound, and placed into state A, that the agent would once again perform action a. This agent is deterministic.
From these observations you can fit any utility function with U(ABCA) > U(ABC) > U(AB) > U(A). But it’s useless, since the history of the universe now contains states ABCA and you can’t in fact roll back the universe. In particular, you have no idea whether U(ABCAB) > U(ABCAA) or not because your observations don’t tell you.
There are infinitely many behavioural rules that are not purely observational, but are compatible with the observations. Some of them allow predictions, some of them don’t. Independently of that, some of them are compatible with a utility function, some of them aren’t.
The rules I gave for my agent are not purely observational—they are the actual rules that the agent uses for its actions (in a simplified, quantized universe) and not just some finite set of observations. The behavioural model corresponding to those rules is incompatible with every utility function.
In that case, “purely observational” would describe an expectation for behavior and not the actual pattern of behavior. This is not at all what the conversion I described involves.
Remember: I’m allowing unlimited memory, taking into account the full history of inputs and outputs (i.e. environmental information received and agent response).
In your example, the history X might be (for example) A(ab)B(bc)C(ca)A, where (pq) is the action that happens to cause the environment to produce Q after P. In this case, the behavioral function B(X) would yield (ab).
Meanwhile, a suitable utility function U(X) would just need to prefer all sequences where each input A is followed by (ab), and so on, to those that where that doesn’t hold. In the case of complete information, as your scenario entails, the utility function could just prefer sequences where B follows A; regardless, this trivially generates the behavior.
Any form of generalization can be represented by a function on behavior which produces its results and yields actions based on them—I’m not following you here. Can you give me an example of a model of behavior that isn’t purely observational, in the sense that it can’t be represented as a function of the full history of actions and responses? Any model with such a representation is susceptible to a utility function that just checks whether each past action adhered to said function.
A purely observational model of behaviour is simply a list of actions that have actually been observed, and the histories of the universe that led to them. For example, with my trivial agent you could observe:
“With the history of the universe being just the list of states [A], it performed action b leading to state B. With the list being [AB] it performed action c leading to state C. With the list being [ABC] it performed action a leading to state A.”
From this model you can conclude that if the universe was somehow rewound, and placed into state A, that the agent would once again perform action a. This agent is deterministic.
From these observations you can fit any utility function with U(ABCA) > U(ABC) > U(AB) > U(A). But it’s useless, since the history of the universe now contains states ABCA and you can’t in fact roll back the universe. In particular, you have no idea whether U(ABCAB) > U(ABCAA) or not because your observations don’t tell you.
There are infinitely many behavioural rules that are not purely observational, but are compatible with the observations. Some of them allow predictions, some of them don’t. Independently of that, some of them are compatible with a utility function, some of them aren’t.
The rules I gave for my agent are not purely observational—they are the actual rules that the agent uses for its actions (in a simplified, quantized universe) and not just some finite set of observations. The behavioural model corresponding to those rules is incompatible with every utility function.
In that case, “purely observational” would describe an expectation for behavior and not the actual pattern of behavior. This is not at all what the conversion I described involves.
Remember: I’m allowing unlimited memory, taking into account the full history of inputs and outputs (i.e. environmental information received and agent response).
In your example, the history X might be (for example) A(ab)B(bc)C(ca)A, where (pq) is the action that happens to cause the environment to produce Q after P. In this case, the behavioral function B(X) would yield (ab).
Meanwhile, a suitable utility function U(X) would just need to prefer all sequences where each input A is followed by (ab), and so on, to those that where that doesn’t hold. In the case of complete information, as your scenario entails, the utility function could just prefer sequences where B follows A; regardless, this trivially generates the behavior.