Classical RL isn’t causal, because there’s no confounding (although I think it is very useful to think about classical RL causally, for doing inference more efficiently).
Various extensions of classical RL are causal, of course.
A lot of interesting algorithmic fairness isn’t really causal. Classical prediction problems aren’t causal.
However, I think domain adaptation, covariate shift, semi-supervised learning are all causal problems.---I think predicting things you have no data on (“what if the AI does something we didn’t foresee”) is sort of an impossible problem via tools in “data science.” You have no data!
A few comments:(a) I think “causal representation learning” is too vague, this overview (https://arxiv.org/pdf/2102.11107.pdf) talks about a lot of different problems I would consider fairly unrelated under this same heading.(b) I would try to read “classical causal inference” stuff. There is a lot of reinventing of the wheel (often, badly) happening in the causal ML space.(c) What makes a thing “causal” is a distinction between a “larger” distribution we are interested in, and a “smaller” distribution we have data on. Lots of problems might look “causal” but really aren’t (in an interesting way) if formalized properly.Please tell Victor I said hi, if you get a chance :).
I gave a talk at FHI ages ago on how to use causal graphs to solve Newcomb type problems. It wasn’t even an original idea: Spohn had something similar in 2012.
I don’t think any of this stuff is interesting, or relevant for AI safety. There’s a pretty big literature on model robustness and algorithmic fairness that uses causal ideas.If you want to worry about the end of the world, we have climate change, pandemics, and the rise of fascism.
Counterfactuals (in the potential outcome sense used in statistics) and Pearl’s structural equation causality semantics are equivalent.
Could you do readers an enormous favor and put references in when you say stuff like this:”Vitamin D and Zinc, and if possible Fluvoxamine, are worth it if you get infected, also Vitamin D is worth taking now anyway (I take 5k IUs/day).”
“MIRI/CFAR is not a cult.”What does being a cult space monkey feel like from the inside?This entire depressing thread is reminding me a little of how long it took folks who watch Rick and Morty to realize Rick is an awful abusive person, because he’s the show’s main character, and isn’t “coded” as a villain.
+1 to all this.
I am not going to waste my time arguing against formalism. When it comes to things like formalism I am going to follow in my grandfather’s footsteps, if it comes time to “have an argument” about it.
What Cummings is proposing is formalism with a thin veneer of silicon valley jargon, like “startups” or whatever, designed to be palatable to people like the ones who frequent this website.
He couldn’t be clearer, re: where his influences are coming from, he cites them at the end. It’s Moldbug, and Siskind (Siskind’s email leaks show what his real opinions are, he’s just being a bit coy).
The proposed system is not going to be more democratic, it is going to be more formalist.
Fascism is bad, Christian.
My response is we have fancy computers and lots of storage—there’s no need to do psychometric models of the brain with one parameter anymore, we can leave that to the poor folks in the early 1900s.How many parameters does a good model of the game of Go have, again? The human brain is a lot more complicated, still.There are lots of ways to show single parameter models are silly, for example discussions of whether Trump is “stupid” or not that keep going around in circles.
“Well, suppose that factor analysis was a perfect model. Would that mean that we’re all born with some single number g that determines how good we are at thinking?”″Determines” is a causal word. Factor analysis will not determine causality for you.I agree with your conclusion, though, g is not a real thing that exists.
Start here: https://en.wikipedia.org/wiki/Bayes_estimator
Should be doing stuff like this, if you want to understand effects of masks:https://arxiv.org/pdf/2103.04472.pdf
https://auai.org/uai2021/pdf/uai2021.89.preliminary.pdf (this really is preliminary, e.g. they have not yet uploaded a newer version that incorporates peer review suggestions).---Can’t do stuff in the second paper without worrying about stuff in the first (unless your model is very simple).
Pretty interesting.Since you are interested in policies that operate along some paths only, you might find these of interest:https://pubmed.ncbi.nlm.nih.gov/31565035/https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6330047/We have some recent stuff on generalizing MDPs to have a causal model inside every state (‘path dependent structural equation models’, to appear in UAI this year).
3: No, that will never work with DL by itself (e.g. as fancy regressions).4: No, that will never work with DL by itself (e.g. as fancy regressions).5: I don’t understand this question, but people already use DL for RL, so the “support” part is already true. If the question is asking whether DL can substitute for doing interventions, then the answer is a very qualified “yes,” but the secret sauce isn’t DL, it’s other things (e.g. causal inference) that use DL as a subroutine.---The problem is, most folks who aren’t doing data science for a living themselves only view data science advances through the vein of hype, fashion trends, and press releases, and so get an entirely wrong sense of what is truly groundbreaking and important.
If there is, I don’t know it.
There’s a ton of work on general sensitivity analysis in the semi-parametric stats literature.
If there is really both reverse causation and regular causation between Xr and Y, you have a cycle, and you have to explain what the semantics of that cycle are (not a deal breaker, but not so simple to do. For example if you think the cycle really represents mutual causation over time, what you really should do is unroll your causal diagram so it’s a DAG over time, and redo the problem there).You might be interested in this paper (https://arxiv.org/pdf/1611.09414.pdf) that splits the outcome rather than the treatment (although I don’t really endorse that paper).
The real question is, why should Xc be unconfounded with Y? In an RCT you get lack of confounding by study design (but then we don’t need to split the treatment at all). But this is not really realistic in general—can you think of some practical examples where you would get lucky in this way?