David Scott Krueger (formerly: capybaralet) comments on Causal confusion as an argument against the scaling hypothesis

David Scott Krueger (formerly: capybaralet) 21 Jun 2022 14:56 UTC
LW: 3 AF: 2
0
AF
I can interpret your argument as being only about the behavior of the system, in which case:
- I agree that models are likely to learn to imitate human dialogue about causality, and this will require some amount of some form of causal reasoning.
- I’m somewhat skeptical that models will actually be able to robustly learn these kinds of abstractions with a reasonable amount of scaling, but it certainly seems highly plausible.

I can also interpret your argument as being about the internal reasoning of the system, in which case:
- I put this in the “deep learning is magic” bucket of arguments; it’s much better articulated than what we said though, I think...
- I am quite skeptical of these arguments, but still find them plausible. I think it would be fascinating to see some proof of concept for this sort of thing (basically addressing the question ‘when can/do foundation models internalize explicitly stated knowledge’)
- Owain_Evans 21 Jun 2022 15:14 UTC
  LW: 6 AF: 3
  0
  AF Parent
  I’m somewhat skeptical that models will actually be able to robustly learn these kinds of abstractions with a reasonable amount of scaling
  GPT-3 (without external calculators) can do very well on math word problems (https://arxiv.org/abs/2206.02336) that combine basic facts about the world with abstract math reasoning. Why think that the kind of causal reasoning humans do is out of reach of scaling (especially if you allow external calculators)? It doesn’t seem different in kind from these math word problems.
  
  when can/do foundation models internalize explicitly stated knowledge
  Some human causal reasoning is explicit. Humans can’t do complex and exact calculations using System 1 intuition, and neither can we do causal reasoning of any sophistication using System 1. The prior over causal relations (e.g. that without looking at any data ‘smoking causes cancer’ is way more likely than the reverse) is more about general world-model building, and maybe there’s more uncertainty about how well scaling learns that.
  - David Scott Krueger (formerly: capybaralet) 21 Jun 2022 17:05 UTC
    LW: 1 AF: 1
    0
    AF Parent
    RE GPT-3, etc. doing well on math problems: the key word in my response was “robustly”. I think there is a big qualitative difference between “doing a good job on a certain distribution of math problems” and “doing math (robustly)”. This could be obscured by the fact that people also make mathematical errors sometimes, but I think the type of errors is importantly different from those made by DNNs.
    - Owain_Evans 21 Jun 2022 18:00 UTC
      LW: 5 AF: 3
      0
      AF Parent
      This is a distribution of math problems GPT-3 wasn’t finetuned on. Yet it’s able to few-shot generalize and perform well. This is an amazing level of robustness relative to 2018 deep learning systems. I don’t see why scaling and access to external tools (e.g. to perform long calculations) wouldn’t produce the kind of robustness you have in mind.
      - David Scott Krueger (formerly: capybaralet) 22 Jun 2022 21:42 UTC
        LW: 2 AF: 1
        0
        AF Parent
        I think you’re moving the goal-posts, since before you mentioned “without external calculators”. I think external tools are likely to be critical to doing this, and I’m much more optimistic about that path to doing this kind of robust generalization. I don’t think that necessarily addresses concerns about how the system reasons internally, though, which still seems likely to be critical for alignment.
- Noosphere89 21 Jun 2022 15:04 UTC
  0 points
  0
  AF Parent
  The important part of his argument is in the second paragraph, and I agree because by and large, pretty much everything we know about science and casuality, at least in the beginning for AI is on trusting the scientific papers and experts. Virtually no knowledge is given by experimentation, but instead by trusting the papers, experts and books.
  - David Scott Krueger (formerly: capybaralet) 21 Jun 2022 17:07 UTC
    LW: 2 AF: 1
    0
    AF Parent
    I disagree; I think we have intuitive theories of causality (like intuitive physics) that are very helpful for human learning and intelligence.
    - Noosphere89 21 Jun 2022 20:09 UTC
      0 points
      0
      Parent
      That might be a crux here, since I view a lot of our knowledge of causality and physics essentially we take on trust, so that we don’t need to repeat experimentation.