Let us now weaken our assumptions by replacing the deterministic environment q with a probability distribution µ(q) over chronological functions. Here µ might be interpreted in two ways. Either the environment itself behaves stochastically defined by µ or the true environment is deterministic, but we only have subjective (probabilistic) information of which environment is the true environment. Combinations of both cases are also possible. We assume here that µ is known and describes the true stochastic behavior of the environment. The case of unknown µ with the agent having some beliefs about the environment lies at the heart of the AIξ model described in Section 4.
The best or most intelligent agent is now the one that maximizes the expected utility (called value function) Vpµ≡Vpµ1m:=∑qµ(q)Vpq1m. This defines the AIµ model.
If I’m skimming the document correctly (I haven’t read it in any detail), building up the AIµ model is part of later turning it into the AIξ model, which is AIXI. From the end of the section:
To get our final universal AI model the idea is to replace µ by the universal probability ξ, defined later.
And section 4:
The main idea of this work is to generalize universal induction to the general agent model described in Section 2. For this, we generalize ξ to include actions as conditions and replace µ by ξ in the rational agent model, resulting in the AIξ(=AIXI) model. In this way the problem that the true prior probability µ is usually unknown is solved. Convergence of ξ→µ can be shown, indicating that the AIξ model could behave optimally in any computable but unknown environment with reinforcement feedback.
Marcus Hutter’s “Universal Algorithmic Intelligence: A mathematical top->down approach” has this in section 2.4.:
If I’m skimming the document correctly (I haven’t read it in any detail), building up the AIµ model is part of later turning it into the AIξ model, which is AIXI. From the end of the section:
And section 4: