Cam comments on Cam’s Shortform

Cam 18 Aug 2025 15:38 UTC
3 points
0
Overview of post-training pipeline for Qwen3:

It seems likely that some amount of preference alignment via SFT occurs during the “cold-start” portion of post-training. However, it’s unclear what proportion of the data this training comprises relative to the other cold-start data or the amount of preference training via RLH(AI)F during Stage 4.