The quantity called Bayesian surprise expresses what you’re trying to say. It is the KL divergence between the guessed probability distribution and the ‘actual’ distribution (distance between your probability distribution computed from some model (RYK) and Kasparov’s actual distribution over moves (perhaps uniforms over the set of the ‘top 5’ moves according to some scoring heuristic)). When these are far apart in terms of KL divergence, your model will be greatly surprised (I recall a quote of yours somewhere in a post where you say, “There are no surprising facts, only models that are surprised by facts. And if a model is surprised by facts then it is no credit to that model.”)
Bayesian surprise is experimentally linked to focus of attention mechanisms in human vision. If I have to have a good model of my world to avoid being hit by cars or eaten by bears, then I had better divert precious resources to allow me to scrutinize events of high Bayesian surprise (high shock to my current model) more intensely. This will almost certainly play a large role in understanding the specific biological mechanisms that lead to cognitive biases.
I also think that playing illustrates a use of Bayesian surprise. You are able to modify your map and discover new, perhaps unanticipated components of the territory, and (and this is what distinguishes ‘playing’ from other types of map updates) you feel safe while you do it. When a child learns to swim, they usually have adults around to protect them and though there is an element of danger, one feels reasonably safe while having your model of buoyancy, gravity, and movement totally surprised (in the Bayesian sense) by the experience of being in the water. The safety and map updating at the same time are what we refer to as fun, at least in many cases.
At any rate, I have noticed a number of places where this appeal to surprise is made and can benefit from rigorously linking to actual Bayesian surprise. For example, if I have two different probability distributions P and Q, then if they fail to have the same support over the domain, then D(P || Q) = \infty. Perhaps Engelbart invented the mouse and was able to extrapolate and anticipate a space of computer tools that could possibly be invented and he had some internal probability distribution over this space. But it’s not those aspects of modern computing that are easily formed from this probability model of his that cause surprise or problems. If he understood mice conceptually, then trackballs clearly aren’t surprising. Probably pen-based computing and even tablets are not surprising. But natural language processing allowing voice control might have been. Or brain-computer interface might have been… etc. Something in device space currently has non-zero probability that did not have non-zero probability in his model, and so there’s infinite surprise between his model and the current situation. But maybe in concept space he did better. But by comparison, device space is to us as concept space is to FOOM’d AI. I certainly believe that a FOOM’d AI’s probability model about various issues will have a strictly larger support set than my own would. There are concepts, heuristics, and metaheuristics that it will be able to assign meaningful probability to and of which I won’t even be able to conceive. There will be infinite surprise.
All cases where humans can abstract problems (of economics, physics, mathematics, etc.) and accurately predict things in solution space indicate domains where the surprise isn’t infinite. There are salient search directions to investigate that produce useful results. This is Robin’s FOOM argument as I understand it from the debate. But it fails to consider cases where there literally are no salient search directions… at least not to us because we can’t model them. Or the ones we can model are pitifully inadequate by comparison.
The quantity called Bayesian surprise expresses what you’re trying to say. It is the KL divergence between the guessed probability distribution and the ‘actual’ distribution (distance between your probability distribution computed from some model (RYK) and Kasparov’s actual distribution over moves (perhaps uniforms over the set of the ‘top 5’ moves according to some scoring heuristic)). When these are far apart in terms of KL divergence, your model will be greatly surprised (I recall a quote of yours somewhere in a post where you say, “There are no surprising facts, only models that are surprised by facts. And if a model is surprised by facts then it is no credit to that model.”)
Bayesian surprise is experimentally linked to focus of attention mechanisms in human vision. If I have to have a good model of my world to avoid being hit by cars or eaten by bears, then I had better divert precious resources to allow me to scrutinize events of high Bayesian surprise (high shock to my current model) more intensely. This will almost certainly play a large role in understanding the specific biological mechanisms that lead to cognitive biases.
I also think that playing illustrates a use of Bayesian surprise. You are able to modify your map and discover new, perhaps unanticipated components of the territory, and (and this is what distinguishes ‘playing’ from other types of map updates) you feel safe while you do it. When a child learns to swim, they usually have adults around to protect them and though there is an element of danger, one feels reasonably safe while having your model of buoyancy, gravity, and movement totally surprised (in the Bayesian sense) by the experience of being in the water. The safety and map updating at the same time are what we refer to as fun, at least in many cases.
At any rate, I have noticed a number of places where this appeal to surprise is made and can benefit from rigorously linking to actual Bayesian surprise. For example, if I have two different probability distributions P and Q, then if they fail to have the same support over the domain, then D(P || Q) = \infty. Perhaps Engelbart invented the mouse and was able to extrapolate and anticipate a space of computer tools that could possibly be invented and he had some internal probability distribution over this space. But it’s not those aspects of modern computing that are easily formed from this probability model of his that cause surprise or problems. If he understood mice conceptually, then trackballs clearly aren’t surprising. Probably pen-based computing and even tablets are not surprising. But natural language processing allowing voice control might have been. Or brain-computer interface might have been… etc. Something in device space currently has non-zero probability that did not have non-zero probability in his model, and so there’s infinite surprise between his model and the current situation. But maybe in concept space he did better. But by comparison, device space is to us as concept space is to FOOM’d AI. I certainly believe that a FOOM’d AI’s probability model about various issues will have a strictly larger support set than my own would. There are concepts, heuristics, and metaheuristics that it will be able to assign meaningful probability to and of which I won’t even be able to conceive. There will be infinite surprise.
All cases where humans can abstract problems (of economics, physics, mathematics, etc.) and accurately predict things in solution space indicate domains where the surprise isn’t infinite. There are salient search directions to investigate that produce useful results. This is Robin’s FOOM argument as I understand it from the debate. But it fails to consider cases where there literally are no salient search directions… at least not to us because we can’t model them. Or the ones we can model are pitifully inadequate by comparison.