phdead comments on GPT-5: The Reverse DeepSeek Moment

phdead 18 Aug 2025 18:22 UTC
4 points
3
Some part of this is that capabilities are not linear, and from what I gather the newer internal models may be less polished (if more capable) than the ones they make public. Especially now what more value add is in post training, I suspect using the work in progress models only feels good closer to release.
- mishka 18 Aug 2025 18:40 UTC
  2 points
  0
  Parent
  Yes, and, perhaps, one would usually want to shrink before post-training, both to make post-training more affordable per iteration, and because I am not sure if post-training-acquired capabilities survive shrinkage as well as pre-training-acquired capabilities (I wonder what is known about that; I want to understand that aspect better; is it insane to postpone shrinkage till after post-training, or is it something to try?).