I don’t see the point in adding so much complexity to such a simple matter. AIXI is an incomputable agent who’s proofs of optimality require a computable environment. It requires a specific configuration of the classic agent-environment-loop where the agent and the environment are independent machines. That specific configuration is only applicable to a sub-set of real-world problems in which the environment can be assumed to be much “smaller” than the agent operating upon it. Problems that don’t involve other agents and have very few degrees of freedom relative the agent operating upon them.
Marcus Hutter already proposed computable versions of AIXI like AIXI_lt. In the context of agent-environment loops, AIXI_lt is actually more general than AIXI because AIXI_lt can be applied to all configurations of the agent-environment loop including the embedded agent configuration. AIXI is a special case of AIXI_lt where the limits of “l” and “t” go to infinity.
Some of the problems you bring up seem to be concerned with the problem of reconciling logic with probability while others seem to be concerned with real-world implementation. If your goal is to define concepts like “intelligence” with mathematical formalizations (which I believe is necessary), then you need to delineate that from real-world implementation. Discussing both simultaneously is extremely confusing. In the real world, an agent only has is empirical observations. It has no “seeds” to build logical proofs upon. That’s why scientists talk about theories and evidence supporting them rather than proofs and axioms.
You can’t prove that the sun will rise tomorrow, you can only show that it’s reasonable to expect the sun to rise tomorrow based on your observations. Mathematics is the study of patterns, mathematical notation is a language we invented to describe patterns. We can prove theorems in mathematics because we are the ones who decide the fundamental axioms. When we find patterns that don’t lend themselves easily to mathematical description, we rework the tool (add concepts like zero, negative numbers, complex numbers, etc.). It happens that we live in a universe that seems to follow patterns, so we try to use mathematics to describe the patterns we see and we design experiments to investigate the extent to which those patterns actually hold.
The branch of mathematics for characterizing systems with incomplete information is probability. If you wan’t to talk about real-world implementations, most non-trivial problems fall under this domain.
I disagree. This is like saying, “we don’t need fluid dynamics, we just need airplanes!”. General mathematical formalizations like AIXI are just as important as special theories that apply more directly to real-world problems, like embedded agents. Without a grounded formal theory, we’re stumbling in the dark. You simply need to understand it for what it is: a generalized theory, then most of the apparent paradoxes evaporate.
Kolmogorov complexity tells us there is no such thing as a universal lossless compression algorithm, yet people happily “zip” data every day. That doesn’t mean Kolmogorov wasted his time coming up with his general ideas about complexity. Real world data tends to have a lot of structure because we live in a low-entropy universe. When you take a photo or record audio, it doesn’t look or sound like white noise because there’s structure in the universe. In math-land, the vast majority of bit-strings would look and sound like incompressible white noise.
The same holds true for AIXI. The vast majority of problems drawn from problem space would essentially be, “map this string of random bits to some other string of random bits” in which case, the best you can hope for is a brute-force tree-search of all the possibilities weighted by Occam’s razor (i.e. Solomonoff inductive inference).
I can’t speak to the motivations or processes of others, but these sound like assumptions without much basis. The reason I tend to define intelligence outside of the environment is because it generalizes much better. There are many problems where the system providing the solution can be decoupled both in time and space from the agent acting upon said solution. Agents solving problems in real-time are a special case, not a general case. The general case is: an intelligent system produces a solution/policy to a problem and an agent in an environment acts upon that solution/policy. An intelligent system might spend all night planning how to most efficiently route mail trucks the next morning, the drivers then follow those routes. A real-time model in which the driver has to plan her routs while driving is a special case. You can think of it as the drivers brain coming up with the solution/policy and the driver acting on it in situ.
You could make the case that the driver has to do on-line/real-time problem solving to navigate the roads and avoid collisions, etc. in which case the full solution would be a hybrid of real-time and off-line formulation (which is probably representative of most situations). Either way, constraining your definition of intelligence to only in-situ problem solving excludes many valid examples of intelligence.
Also, it doesn’t seem like you understand what Solomonoff inductive inference is. The weighted average is used because there will typically be multiple world models that explain your experiences at any given point in time and Occam’s razor says to favor shorter explanations that give the same result, so you weight the predictions of each model by the inverse of the length of the model (in bits, usually).
I think you’re confusing behavior with implementation. When people talk about neural nets being “universal function approximators” they’re talking about the input-output behavior, not the implementation. Obviously the implementation of an XOR gate is different than a neural net that approximates an XOR gate.