This seems right, because models with situational awareness might respond differently to the RL training. I suppose we could evaluate the model for situational awareness throughout the pre-training process and introduce the constitution before we see any signs. If the model is too abstract and unfocused at that point, we could re-introduce the constitution additionally later.
This seems right, because models with situational awareness might respond differently to the RL training. I suppose we could evaluate the model for situational awareness throughout the pre-training process and introduce the constitution before we see any signs. If the model is too abstract and unfocused at that point, we could re-introduce the constitution additionally later.