The rightward movement predicts that Claude 5 will be 100% ravenclaw while Claude 6 will be 50% ravenclaw and 50% slytherin.
High Slytherin percentage as a misalignment/​scheming red flag?
If so, does this predict a barber poll type effect where a better misaligned model will successfully fool the Sorting Hat and present as Gryffindor?
The rightward movement predicts that Claude 5 will be 100% ravenclaw while Claude 6 will be 50% ravenclaw and 50% slytherin.
High Slytherin percentage as a misalignment/​scheming red flag?
If so, does this predict a barber poll type effect where a better misaligned model will successfully fool the Sorting Hat and present as Gryffindor?