Ben Livengood comments on GPT-4

Ben Livengood 14 Mar 2023 19:48 UTC
8 points
−9
My takeaways:

Scaling laws work predictably. There is plenty of room for improvement should anyone want to train these models longer, or presumably train larger models.

The model is much more calibrated before fine-tuning/RLHF, which is a bad sign for alignment in general. Alignment should be neutral or improve calibration for any kind of reasonable safety.

GPT-4 is just over 1-bit error per word at predicting its own codebase. That’s seems close to the capability to recursively self-improve.
- [deleted] 14 Mar 2023 20:07 UTC
  16 points
  6
  Parent
  - Ben Livengood 15 Mar 2023 0:30 UTC
    5 points
    0
    Parent
    Page 3 of the PDF has a graph of prediction loss on the OpenAI codebase dataset. It’s hard to link directly to the graph, it’s Figure 1 under the Predictable Scaling section.
    - gwern 15 Mar 2023 0:47 UTC
      24 points
      0
      Parent
      You can just append #page=3. This works in most PDF viewers. (There are many query parameters that Adobe supports but that’s really the only one you need to know about.)
    - [deleted] 15 Mar 2023 0:44 UTC
      9 points
      1
      Parent
      - Ben Livengood 15 Mar 2023 21:34 UTC
        3 points
        1
        Parent
        OpenAI is, apparently[0], already using GPT-4 as a programming assistant which means it may have been contributing to its own codebase. I think recursive self improvement is a continuous multiplier and I think we’re beyond zero at this point. I think the multiplier is mostly coming from reducing serial bottlenecks at this point by decreasing the iteration time it takes to make improvements to the model and supporting codebases. I don’t expect (many?) novel theoretical contributions from GPT-4 yet.
        
        However, it could also be prompted with items from the Evals dataset and asked to come up with novel problems to further fine-tune the model against. Humans have been raising challenges (e.g. the Millennium problems) for ourselves for a long time and I think LLMs probably have the ability to self-improve by inventing machine-checkable problems that they can’t solve directly yet.
        
        [0]: “We’ve also been using GPT-4 internally, with great impact on functions like support, sales, content moderation, and programming.”—https://openai.com/research/gpt-4#capabilities
  - Gerald Monroe 14 Mar 2023 21:55 UTC
    3 points
    1
    Parent
    RSI doesn’t require a self authored codebase anyways. RSI can be as simple as “edit a text file to control an existing framework”, where that framework is authored by humans (who used gpt-4 or codex to accelerate authoring it)
- Dyingwithdignity1 14 Mar 2023 22:15 UTC
  3 points
  2
  Parent
  Also interested in their scaling predictions. Their plots at least seem to be flattening but I also wonder how far they extrapolated and if they know when a GPT-N would beat all humans on the metrics they used.