Machine learning can sorta do this, with human guidance. For instance, if we want to predict whether an animal is a dog or an elephant given its weight and its height, we could find a training set (containing a bunch of dogs and a bunch of elephants) and then fit 2 2-variate lognormal distributions to this training set—one for the dogs, and one for the elephants. (Using some sort of gradient descent, say). Then P(weight=w, height=h | species=s) is just the probability density at the point (w, h) on the distribution for species s. Search term: “generative model”.
And in this context a world-model might be a joint distribution over, say, all triples (weight, height, label). Though IRL there’s too much stuff in the world for us to just hold a joint distribution over everything in our heads, we have to make do with something between a Bayes net and a big ball of adhockery.
Machine learning can sorta do this, with human guidance. For instance, if we want to predict whether an animal is a dog or an elephant given its weight and its height, we could find a training set (containing a bunch of dogs and a bunch of elephants) and then fit 2 2-variate lognormal distributions to this training set—one for the dogs, and one for the elephants. (Using some sort of gradient descent, say). Then P(weight=w, height=h | species=s) is just the probability density at the point (w, h) on the distribution for species s. Search term: “generative model”.
And in this context a world-model might be a joint distribution over, say, all triples (weight, height, label). Though IRL there’s too much stuff in the world for us to just hold a joint distribution over everything in our heads, we have to make do with something between a Bayes net and a big ball of adhockery.