I recall my previous comment about Claude and unsuccesful attempts to elicit similar behaviour. Turns out I just had to wait for more evidence for theories 3 and 2. If OpenAI created a model which is worse aligned, then what’s the reason, the myopic Spec or the increased capabilities? I wish that mankind had a capabilities-misalignment plot like the capabilities-cost plot of ARC-AGI or the capabilities-release date plot of METR, since this plot would actually help us to understand the perspectives...
I recall my previous comment about Claude and unsuccesful attempts to elicit similar behaviour. Turns out I just had to wait for more evidence for theories 3 and 2. If OpenAI created a model which is worse aligned, then what’s the reason, the myopic Spec or the increased capabilities? I wish that mankind had a capabilities-misalignment plot like the capabilities-cost plot of ARC-AGI or the capabilities-release date plot of METR, since this plot would actually help us to understand the perspectives...