You’re seeking an algorithm F, that, given observations O, gives you a correct causal model M. Each observation has a cost, and each manipulation of a variable (experimental case) has a large cost. You are seeking an algorithm with a good ratio of effectiveness to cost.
So I have an idea. You have a large number of cases : places where there is correlation, where there is causation, where confounding factors are large, where they are small.
To me this sounds like you can find a better model by generating a ‘benchmark’ of a large number of randomized situations, generating the observations, with error, that you would see, and find out which algorithms discover the true relationship best. Might be easier than theorizing with flowcharts.
😅 I think that is a very optimistic framing of the problem.
The hard part isn’t really weighing the costs of different known observables to find efficient ways to study things, the hard part is in figuring out what observables that there are and how to use them correctly.
I don’t think this is particularly viable algorithmically; it seems like an AI-complete problem. (Though of course one should be careful about claiming that, as often AI-complete things turn out to not be so AI-complete anyway.)
The core motivation for the series of blog posts I’m writing is, I’ve been trying to study various things that require empirical causal inference, and so I need to apply theory to figure out how to do this. But I find existing theory to be somewhat ad-hoc, providing a lot of tools with a lot of assumptions, but lacking an overall picture. This is fine if you just want to apply some specific tool, as you can then learn lots of details about that tool. But if you want to study a phenomenon, you instead need some way to map what you already know about the phenomenon to an appropriate tool, which requires a broader overview.
This post is just one post in a series. (Hopefully, at least—I do tend to get distracted.) It points out a key requirement for a broad range of methods—having some cause of interest where you know something about how the cause varies. I’m hoping to create a checklist with a broad range of causal inference methods and their key requirements. (Currently I have 3 other methods in mind that I consider to be completely distinct from this method, as well as 2 important points of critique on this method that I think usually get lost in the noise of obscure technical requirements for the statistics to be 100% justified.)
Regarding “theorizing with flowcharts”, I tend to find it pretty easy. Perhaps it’s something that one needs to get used to, but graphs are a practical way to summarize causal assumptions. Generating data may of course be helpful too, and I intend to do this in a later blog post, but it quickly gets unwieldy in that e.g. there are many parameters that can vary in the generated data, and which need to be explored to ensure adequate coverage of the possibilities.
I wasn’t saying a flowchart wasn’t helpful. I was saying if you want to find an algorithm to solve the problem, which is obtaining information about causal relationships at the lowest cost, you need to do it numerically.
This problem is very solvable as you are simply seeking an algorithm with the best score on a heuristic for accuracy and cost. Where “solvable means” “matches or exceeds state of the art”.
You’re seeking an algorithm F, that, given observations O, gives you a correct causal model M. Each observation has a cost, and each manipulation of a variable (experimental case) has a large cost. You are seeking an algorithm with a good ratio of effectiveness to cost.
So I have an idea. You have a large number of cases : places where there is correlation, where there is causation, where confounding factors are large, where they are small.
To me this sounds like you can find a better model by generating a ‘benchmark’ of a large number of randomized situations, generating the observations, with error, that you would see, and find out which algorithms discover the true relationship best. Might be easier than theorizing with flowcharts.
😅 I think that is a very optimistic framing of the problem.
The hard part isn’t really weighing the costs of different known observables to find efficient ways to study things, the hard part is in figuring out what observables that there are and how to use them correctly.
I don’t think this is particularly viable algorithmically; it seems like an AI-complete problem. (Though of course one should be careful about claiming that, as often AI-complete things turn out to not be so AI-complete anyway.)
The core motivation for the series of blog posts I’m writing is, I’ve been trying to study various things that require empirical causal inference, and so I need to apply theory to figure out how to do this. But I find existing theory to be somewhat ad-hoc, providing a lot of tools with a lot of assumptions, but lacking an overall picture. This is fine if you just want to apply some specific tool, as you can then learn lots of details about that tool. But if you want to study a phenomenon, you instead need some way to map what you already know about the phenomenon to an appropriate tool, which requires a broader overview.
This post is just one post in a series. (Hopefully, at least—I do tend to get distracted.) It points out a key requirement for a broad range of methods—having some cause of interest where you know something about how the cause varies. I’m hoping to create a checklist with a broad range of causal inference methods and their key requirements. (Currently I have 3 other methods in mind that I consider to be completely distinct from this method, as well as 2 important points of critique on this method that I think usually get lost in the noise of obscure technical requirements for the statistics to be 100% justified.)
Regarding “theorizing with flowcharts”, I tend to find it pretty easy. Perhaps it’s something that one needs to get used to, but graphs are a practical way to summarize causal assumptions. Generating data may of course be helpful too, and I intend to do this in a later blog post, but it quickly gets unwieldy in that e.g. there are many parameters that can vary in the generated data, and which need to be explored to ensure adequate coverage of the possibilities.
I wasn’t saying a flowchart wasn’t helpful. I was saying if you want to find an algorithm to solve the problem, which is obtaining information about causal relationships at the lowest cost, you need to do it numerically.
This problem is very solvable as you are simply seeking an algorithm with the best score on a heuristic for accuracy and cost. Where “solvable means” “matches or exceeds state of the art”.