This might be a dumb question(s), I’m struggling to focus today and my linear algebra is rusty.
Is the observation that ‘you can do feature ablation via weight orthogonalization’ a new one?
It seems to me like this (feature ablation via weight orthogonalization) is a pretty powerful tool which could be applied to any linearly represented feature. It could be useful for modulating those features, and as such is another way to do ablations to validate a feature (part of the ‘how do we know we’re not fooling ourselves about our results’ toolkit). Does this seem right? Or does it not actually add much?
Sam Altman and OpenAI have both said they are aiming for incremental releases/deployment for the primary purpose of allowing society to prepare and adapt. Opposed to, say, dropping large capabilities jumps out of the blue which surprise people.
I think “They believe incremental release is safer because it promotes societal preparation” should certainly be in the hypothesis space for the reasons behind these actions, along with scaling slowing and frog-boiling. My guess is that it is more likely than both of those reasons (they have stated it as their reasoning multiple times; I don’t think scaling is hitting a wall).