I read the latter one, and from a brief glance I think the first one is essentially the same paper. It uses some tricks to get a relatively efficient algorithm in the special case where all the agent has to do is recognize some simple patterns in the environment, somewhat simpler than regular expressions. It would never be able to learn that, in general, the distance the ball bounces up is 50% the distance that it fell, but if the falling distance were quantized and the bouncing distance were quantized and there was a maximum height the ball could fall from, it could eventually learn all possible combinations.
They also gave up on the idea of experimentation not being a special case for AIXI and instead use other heuristics to decide how much to experiment and how often to do take the best known action for a short-term reward.
I believe they’re doing the best they can, and for all I know it might be state of the art, but it isn’t general intelligence.
I read the latter one, and from a brief glance I think the first one is essentially the same paper. It uses some tricks to get a relatively efficient algorithm in the special case where all the agent has to do is recognize some simple patterns in the environment, somewhat simpler than regular expressions. It would never be able to learn that, in general, the distance the ball bounces up is 50% the distance that it fell, but if the falling distance were quantized and the bouncing distance were quantized and there was a maximum height the ball could fall from, it could eventually learn all possible combinations.
They also gave up on the idea of experimentation not being a special case for AIXI and instead use other heuristics to decide how much to experiment and how often to do take the best known action for a short-term reward.
I believe they’re doing the best they can, and for all I know it might be state of the art, but it isn’t general intelligence.