My guess how this may not really help is the model builds the abstractions in pre-training, and the massive optimization pressure in post-training makes something really sticky: for example “a persona living in Orwellian surveillance, really fluent in doublethink”.
My guess how this may not really help is the model builds the abstractions in pre-training, and the massive optimization pressure in post-training makes something really sticky: for example “a persona living in Orwellian surveillance, really fluent in doublethink”.