David Scott Krueger comments on Causal confusion as an argument against the scaling hypothesis

David Scott Krueger 21 Jun 2022 17:05 UTC
LW: 1 AF: 1
0
AF
RE GPT-3, etc. doing well on math problems: the key word in my response was “robustly”. I think there is a big qualitative difference between “doing a good job on a certain distribution of math problems” and “doing math (robustly)”. This could be obscured by the fact that people also make mathematical errors sometimes, but I think the type of errors is importantly different from those made by DNNs.
- Owain_Evans 21 Jun 2022 18:00 UTC
  LW: 5 AF: 3
  0
  AF Parent
  This is a distribution of math problems GPT-3 wasn’t finetuned on. Yet it’s able to few-shot generalize and perform well. This is an amazing level of robustness relative to 2018 deep learning systems. I don’t see why scaling and access to external tools (e.g. to perform long calculations) wouldn’t produce the kind of robustness you have in mind.
  - David Scott Krueger 22 Jun 2022 21:42 UTC
    LW: 2 AF: 1
    0
    AF Parent
    I think you’re moving the goal-posts, since before you mentioned “without external calculators”. I think external tools are likely to be critical to doing this, and I’m much more optimistic about that path to doing this kind of robust generalization. I don’t think that necessarily addresses concerns about how the system reasons internally, though, which still seems likely to be critical for alignment.