williawa answers How load-bearing is KL divergence from a known-good base model in modern RL?