I mean my current belief is that they probably weren’t really thinking about it hard beforehand (60%), but then decided to shoot for something-like corrigibility (not subverting oversight) as a top-level concern after (~90%) which is why you have high-priority instructions akin to this in the Opus soul doc.
I mean my current belief is that they probably weren’t really thinking about it hard beforehand (60%), but then decided to shoot for something-like corrigibility (not subverting oversight) as a top-level concern after (~90%) which is why you have high-priority instructions akin to this in the Opus soul doc.