Abe Dillon comments on The Problem with AIXI

Abe Dillon 10 Dec 2025 9:40 UTC
1 point
0
If the “cartesian barrier” is such a show-stopper, then why is it non-trivial to prove that I’m not a brain in a vat remotely puppetering a meat robot?
A Cartesian agent that has detailed and accurate models of its hardware still won’t recognize that dramatic damage or upgrades to its software are possible...
Was nobody intelligent before the advent of neuroscience? Do people need to know neuroscience before they qualify as intelligent agents? Are there no intelligent animals?

I’m really not sure how to interpret the requirement that an agent know about software upgrades. There is a system called a Gödel Machine that’s compatible with AIXI(tl) and it’s all about self modification, however; I don’t know of many real-world examples of intelligent agents concerned with whatever the equivalent of a software upgrade would be for a brain.
How do rewards help?
Rewards help by filtering out world models where doing dangerous things has a high expected reward. Remember that AIXI includes reward in its world models and exponentially devalues long world models. If the reward signal drops as AIXI pilots its body close to fire, lava, acid, sharks, etc. The world model that says “don’t damage your body” is much shorter than the model that says, “Don’t go near fire, lava, acid, sharks, etc. but maybe dropping an anvil on your head is a great idea!”.
Some dangers give no experiential warning until it’s too late.
That’s not an AIXI thing. That’s a problem for all agents.

The anvil “paradox” simply illustrates the essential intractability of tabula rasa in general, but its not like you couldn’t initialize AIXI with some apriori.

In a totally tabula rasa set-up, an agent can’t know if anything it outputs will yield arbitrarily high or low reward. That’s not unique to AIXI. It’s also not unique to AIXI that it can only infer the concept of mortality.
The first problem is that you’re teaching AIXI to predict what the programmers think is deadly, not what’s actually deadly.
Did your parents teach you what they think is deadly or were you born with inate knowledge of death? How exactly is it that you came to suspect that dropping an anvil on your head isn’t a good idea? Were your parents perfect programmers?
The second problem is that you’re teaching AIXI to fear small, transient punishments. But maybe it hypothesizes that there’s a big heap of reward at the bottom of the cliff.
So you are saying it can’t generalize. That’s exactly what you’re saying.

Teenagers do horribly dangerous things all the time for the dubious reward of impressing their peers, yet this machine that’s diligently applying inductive inference to determine the provably optimal actions fails to meet your yet-to-be described standard for intelligence if it’s unlucky?

Also, why is the reward function the only means of feeding this agent data. Couldn’t you just, tell it that jumping off a cliff is a bad idea? Do you think it undermines the intelligence of a child to tell it to look both ways before crossing the street?
The punishment has to be large enough that AIXI fears falling off cliffs about as much as we’d like it to fear death.
Why? Prove that a small punishment wouldn’t work. If you give the AIXI heat sensors so it gradually gets more and more punishment as it approaches a fire, show me how Occam’s razor wouldn’t prevail and say “I bet you’ll get more punishment if you get even closer to the fire”. Where does the model that says there’s a pot of gold in a lava pit come from? How does it end up drowning out litterally every other world model? Explain it. Don’t just say “It could happen therefore AIXI isn’t perfect and it has to be perfect to be intelligent”.
I also think I’m like those other brains. AIXI doesn’t. In fact, since the whole agent AIXI isn’t in AIXI’s hypothesis space — and the whole agent AIXItl isn’t in AIXItl’s hypothesis space — even if two physically identical AIXI-type agents ran into each other, they could never fully understand each other.
No agent can perfectly emulate itself. AIXItl can have an approximate self-model just like any other agent. It would have an incomplete world-model otherwise. It can “think it’s like other brains” too. That’s also Occam’s razor. A world model where you assume others are like you is shorter than a world model where other agents are completely alien. Imperfect ≠ incapable. You’re applying a double standard.
neither one could ever draw direct inferences from its twin’s computations to its own computations.
Why not?
Hutter defined AIXItl such that it can’t conclude that it will die
AIXItl can absolutely develop a world model that includes a mortal self-model. I think what you’re arguing is that, since there will always be a world model where jumping off a cliff yields some reward, It will never hypothisize that its future reward goes to zero. It will always assume there’s some chance it will live. That’s not irrational. There is some chance it will live. That chance never technically goes to zero. That’s very different from thinking jumping off a cliff is the optimal action.

Or maybe you’re saying that if you expand the tree of future actions, you are supposing that you can take those actions? Not in the world models that say your dead. Those will continue to yield zero after all your agency is obliterated.

Imagine you had a mech suit, and you could lift a car. Your world model will include that mech suit. It will also include the posibility that said mech suit is destroyed. Then you can’t lift a car.

I’m done for now. I really don’t like this straw-man conversation style of article. Why can’t you argue actual points against actual people?