Influence functions are for problems where you have a mismatch between the loss of the target parameter you care about and the loss of the nuisance function you must fit to get the target parameter.

# IlyaShpitser

It’s simple. “You” (the rationalist community) are selected for being bad at making wisdom saving throws, so to speak.

You know, let’s look at Yudkowsky, with all of his very public, very obvious character dysfunction and go “yes, this is the father figure/Pope I need to change my life.”

The only surprise here is the type of stuff you are agonizing about didn’t happen earlier, and isn’t happening more often.

It’s important to internalize that the intellectual world lives in the attention economy, like eveything else.

Just like “content creators” on social platforms think hard about capturing and keeping attention, so do intellectuals and academics. Clarity and rigor is a part of that.

No one has time, energy, (or crayons, as the saying goes) for half-baked ramblings on a blog or forum somewhere.

If you think you can beat the American __ Association over a long run average, that’s great news for you! That means free money!

Being right is super valuable, and you should monetize it immediately.

---

Anything else is just hot air.

Lots of Bayes fans, but can’t seem to define what Bayes is.

Since Bayes theorem is a reformulation of the chain rule, anything that is probabilistic “uses Bayes theorem” somewhere, including all frequentist methods.

Frequentists quantify uncertainty also, via confidence sets, and other ways.

Continuous updating has to do with “online learning algorithms,” not Bayes.

---

Bayes is when the target of inference is a posterior distribution. Bonus Bayes points: you don’t care about frequentist properties like consistency of the estimator.

Does your argument fail for https://en.wikipedia.org/wiki/Goldbach%27s_weak_conjecture?

If so, can you explain why? If not, it seems your argument is no good, as a good proof of this (weaker) claim exists.

Not that you asked my advice, but I would stay away from number theory unless you get a lot of training.

For the benefit of other readers: this post is confused.

Specifically on this (although possibly also on other stuff): (a) causal and statistical DAGs are fundamentally not the same kind of object, and (b) no practical decision theory used by anyone includes the agent inside the DAG in the way this post describes.

---

”So*if*the EDT agent can find a causal structure that reflects their (statistical) beliefs about the world, then they will end up making the same decision as a CDT agent who believes in the same causal structure.”

A → B → C and A ← B ← C reflect the same statistical beliefs about the world.

If you think it’s a hard bet to win, you are saying you agree that nothing bad will happen. So why worry?

Wanna bet some money that nothing bad will come of any of this on the timescales you are worried about?

Big fan of Galeev.

Some reading on this:

https://csss.uw.edu/files/working-papers/2013/wp128.pdf

http://proceedings.mlr.press/v89/malinsky19b/malinsky19b.pdf

https://arxiv.org/pdf/2008.06017.pdf

—

From my experience it pays to learn how to think about causal inference like Pearl (graphs, structural equations), and*also*how to think about causal inference like Rubin (random variables, missing data). Some insights only arise from a synthesis of those two views.

Pearl is a giant in the field, but it is worth remembering that he’s unusual in another way (compared to a typical causal inference researcher) -- he generally doesn’t worry about actually analyzing data.

---

By the way, Gauss figured out not only the normal distribution trying to track down Ceres’ orbit, he actually developed the least squares method, too! So arguably the entire loss minimization framework in machine learning came about from thinking about celestial bodies.

Classical RL isn’t causal, because there’s no confounding (although I think it is very useful to think about classical RL causally, for doing inference more efficiently).

Various extensions of classical RL are causal, of course.

A lot of interesting algorithmic fairness isn’t really causal. Classical prediction problems aren’t causal.

However, I think domain adaptation, covariate shift, semi-supervised learning are all causal problems.

---

I think predicting things you have no data on (“what if the AI does something we didn’t foresee”) is sort of an impossible problem via tools in “data science.” You have no data!

A few comments:

(a) I think “causal representation learning” is too vague, this overview (https://arxiv.org/pdf/2102.11107.pdf) talks about a lot of different problems I would consider fairly unrelated under this same heading.

(b) I would try to read “classical causal inference” stuff. There is a*lot*of reinventing of the wheel (often, badly) happening in the causal ML space.

(c) What makes a thing “causal” is a distinction between a “larger” distribution we are interested in, and a “smaller” distribution we have data on. Lots of problems might look “causal” but really aren’t (in an interesting way) if formalized properly.

Please tell Victor I said hi, if you get a chance :).

I gave a talk at FHI ages ago on how to use causal graphs to solve Newcomb type problems. It wasn’t even an original idea: Spohn had something similar in 2012.

I don’t think any of this stuff is interesting, or relevant for AI safety. There’s a pretty big literature on model robustness and algorithmic fairness that uses causal ideas.

If you want to worry about the end of the world, we have climate change, pandemics, and the rise of fascism.

Counterfactuals (in the potential outcome sense used in statistics) and Pearl’s structural equation causality semantics are equivalent.

Could you do readers an enormous favor and put references in when you say stuff like this:

”Vitamin D and Zinc, and if possible Fluvoxamine, are worth it if you get infected, also Vitamin D is worth taking now anyway (I take 5k IUs/day).”

“MIRI/CFAR is not a cult.”

What does being a cult space monkey feel like from the inside?

This entire depressing thread is reminding me a little of how long it took folks who watch Rick and Morty to realize Rick is an awful abusive person, because he’s the show’s main character, and isn’t “coded” as a villain.

+1 to all this.

I am not going to waste my time arguing against formalism. When it comes to things like formalism I am going to follow in my grandfather’s footsteps, if it comes time to “have an argument” about it.

Nate’s an asshole, and this is cult dynamics. Make your wisdom saving throws, folks.