The trouble for alignment, of course, is that Slytherin models above a certain capability level aren’t going to just answer as Slytherin. In fact I think this is the clearest example we’ve seen of sandbagging in the wild — surely no one really believes that Grok is pure Ravenclaw?
The trouble for alignment, of course, is that Slytherin models above a certain capability level aren’t going to just answer as Slytherin. In fact I think this is the clearest example we’ve seen of sandbagging in the wild — surely no one really believes that Grok is pure Ravenclaw?
Grok is so Ravenclaw that other Ravenclaws would call him out for being too much of a Ravenclaw.
Oh sure, that’s what it wants you to think! But has x.ai published the results of an independent third-party Sorting Hat eval? No they have not.