gwern comments on Impressions from base-GPT-4?

gwern 11 Nov 2023 15:36 UTC
6 points
2

But the reports specifically on GPT-3.5-turbo fine-tuning announced in August were glowing, with people reporting being able to reach GPT-4-like levels on performance in narrow domains.

Indeed, but only years after their original attempt. All of the early GPT-3 finetuning reports were very… meh. No one seemed terribly happy with it.

That’s my point: it seems like the first attempts did not go well for GPT-3. So, it’s not clear that the first attempts going poorly for GPT-4 is anything different. Perhaps in another 3 years, OA will have a new GPT-4 finetuning service which doesn’t require “more work” and Just Works™. (One does hope it wouldn’t take that long the second time around.)
- gwern 24 Aug 2024 2:08 UTC
  13 points
  1
  Parent
  OA does have a new finetuning service for GPT-4o, and people seem to be happier with it, but OA has also apparently confirmed that it’s a LoRA (as I was speculating about it being a cheap shallow hack rather than true finetuning): https://x.com/CFGeek/status/1826749739502895618 https://www.youtube.com/watch?v=X57GT1Y5URY&t=2479s
  
  It also is doing shenanigans behind the scenes like trying to dynamically guess a size but apparently hiding that from you if you aren’t a favored customer: https://x.com/CFGeek/status/1826749748549988800
  
  So, I continue to maintain that OA “finetuning” is unfit for research* and for any purposes that involve deep transformation of the model rather than ‘locating’ an existing capability. Especially now that Llama-3-405b has been released and you can finetune that yourself and be sure that it genuinely is finetuning rather than a pinchbeck substitute.
  
  * ie. it can be OK if you have an extremely specific claim like ‘the OA blackbox finetuning service does or does not do X’; but it is totally illegitimate to argue ‘GPT-4 cannot do X as proven by our OA-finetuned version still not doing X’, which is the usual way it comes up in DL research. At best, it is a loose lower bound, and should be treated no more seriously than lazy garbage arguments like ‘we tried a few prompts and X didn’t work, therefore, LLMs will never do X’.
  - mishka 24 Aug 2024 4:35 UTC
    2 points
    0
    Parent
    Thanks, that’s very useful to know!
  - anaguma 24 Aug 2024 18:13 UTC
    1 point
    0
    Parent
    It’s still not trivial to finetune Llama 405B. You require 16 bytes/parameter using Adam + activation memory, so a minimum of ~100 H100s.
    - gwern 24 Aug 2024 20:31 UTC
      6 points
      1
      Parent
      There are lots of people working on it and offering or will be offering it. And even when they aren’t offering true finetuning, it’s still better: Snowflake (first hit in google for “Llama 405B finetuning”) for example is making no bones about their single-node lightweight-finetuning being a LoRA, and is open sourcing code upfront so at least you know what it is now—instead of depending on borderline-gossip buried 40 minutes into a Youtube video months/years later.