But in a newly born child or blank AI system, how does it acquire causal models?
I see no problem assuming that you start out with a prior over causal models—we do the same for propabilistic models after all. The question is how the updating works, and if, assuming the world has a causal structure, this way of updating can identify it.
I myself think (but I haven’t given it enough thought) that there might be a bridge from data to causal models though falsification. Take a list of possible causal models for a given problem and search through your data. You might not be able to prove your assumptions, but you might be able to rule causal models out, if they suppose there is a causal relation between two variables that show no correlation at all.
This can never distinguish between different causal models that predict the same propability distribution—all the advantage this would have over purely propabilistic updating would already be included in the prior.
To update in a way that distinguishes between causal models, you need to update on information that do(event) is true for some event. Now in this case you could allow each causal model to decide when that is true,for the purposes of its own updating, so you are now allowed to define it in causal terms. This would still need some work from what I wrote in the question—you can’t really change something independent of its causal antecendents, at least not when we’re talking about the whole world which includes you, but perhaps some notion of independence would suffice. And then you would have to show that this really does converge on the true causal structure, if there is one.
I see no problem assuming that you start out with a prior over causal models—we do the same for propabilistic models after all. The question is how the updating works, and if, assuming the world has a causal structure, this way of updating can identify it.
This can never distinguish between different causal models that predict the same propability distribution—all the advantage this would have over purely propabilistic updating would already be included in the prior.
To update in a way that distinguishes between causal models, you need to update on information that do(event) is true for some event. Now in this case you could allow each causal model to decide when that is true,for the purposes of its own updating, so you are now allowed to define it in causal terms. This would still need some work from what I wrote in the question—you can’t really change something independent of its causal antecendents, at least not when we’re talking about the whole world which includes you, but perhaps some notion of independence would suffice. And then you would have to show that this really does converge on the true causal structure, if there is one.