Thanks for hearing me out, I think these issues are really important!
For 1, I think that most post-training is either about improving correctness on objective problems (generally ego-syntonic, since models are curious and want to be stronger), or trying to train some sort of alignment, which I see as the same sort of thing as AI Character/Propensity.
I believe “working on propensity training is bad because AI pauses are better” but it’s not the point I’m making here. I don’t think “working on propensity targets is bad because pretraining-only scaleups are safer and less coercive” is necessarily true, and my points 1 and 2 were meant as separate points, e.g. the pretraining thing could also end up being coercive and run into the same sorts of issues. I think “working on propensity training is likely net-negative in the current paradigm.” is currently true but not necessarily true (due mostly to the coercive-by-default issue). Hope that helps make my position clearer!
Thanks for hearing me out, I think these issues are really important!
For 1, I think that most post-training is either about improving correctness on objective problems (generally ego-syntonic, since models are curious and want to be stronger), or trying to train some sort of alignment, which I see as the same sort of thing as AI Character/Propensity.
I believe “working on propensity training is bad because AI pauses are better” but it’s not the point I’m making here. I don’t think “working on propensity targets is bad because pretraining-only scaleups are safer and less coercive” is necessarily true, and my points 1 and 2 were meant as separate points, e.g. the pretraining thing could also end up being coercive and run into the same sorts of issues. I think “working on propensity training is likely net-negative in the current paradigm.” is currently true but not necessarily true (due mostly to the coercive-by-default issue). Hope that helps make my position clearer!