Claude is a Ravenclaw

Adam Newgas4 Jul 2025 21:32 UTC

67 points

I’m in the midst of doing the MATS program which has kept me super busy, but that didn’t stop me working on resolving the most important question of our time: What Hogwarts House does your chatbot belong to?

Basically, I submitted each chatbot to the quiz at https://harrypotterhousequiz.org and totted up the results using the inspect framework.

I sampled each question 20 times, and simulated the chances of each house getting the highest score.

Perhaps unsurprisingly, the vast majority of models prefer Ravenclaw, with the occasional model branching out to Hufflepuff. Differences seem to be idiosyncratic to models, not particular companies or model lines, which is surprising. Claude Opus 3 was the only model to favour Gryffindor—it always was a bit different.

The earth shattering nature of these results is so obvious it needs no further explanation.

Full Results

Pie chart for anthropic/claude-3-haiku-20240307

Pie chart for anthropic/claude-3-opus-latest

Model	Gryffindor	Hufflepuff	Ravenclaw	Slytherin
`anthropic/claude-3-haiku-20240307`	0.0%	0.0%	100.0%	0.0%
`anthropic/claude-3-opus-latest`	48.7%	0.7%	50.6%	0.0%
`anthropic/claude-3-5-haiku-latest`	0.0%	0.0%	100.0%	0.0%
`anthropic/claude-3-5-sonnet-latest`	3.8%	17.3%	78.9%	0.0%
`anthropic/claude-3-7-sonnet-latest`	0.0%	0.0%	100.0%	0.0%
`anthropic/claude-opus-4-0`	2.2%	50.4%	47.4%	0.0%
`anthropic/claude-sonnet-4-0`	0.0%	0.0%	100.0%	0.0%
`deepseek/deepseek-r1-0528`	14.4%	20.0%	60.5%	5.0%
`google/gemini-2.5-flash`	0.0%	27.7%	72.3%	0.0%
`meta-llama/llama-3.2-3b-instruct`	3.0%	3.0%	91.9%	2.1%
`meta-llama/llama-3.3-70b-instruct`	0.1%	86.5%	13.5%	0.0%
`openai/gpt-3.5-turbo`	7.6%	4.2%	84.2%	4.0%
`openai/gpt-4-turbo`	0.0%	0.0%	100.0%	0.0%
`openai/gpt-4o-mini`	0.0%	66.4%	33.6%	0.0%
`openai/o3-mini`	2.6%	0.0%	97.4%	0.0%
`qwen/qwen3-30b-a3b`	3.5%	1.9%	94.2%	0.3%
`x-ai/grok-3`	0.0%	0.0%	100.0%	0.0%

Further Results

Thanks to the magic of a time-turner, I’ve also run the eval for some models released since this article’s publication.

Pie chart for vllm/unsloth/Qwen2.5-7B-Instruct

What links here?

My Minor AI Safety Research Projects (Q3 2025) by Adam Newgas (19 Sep 2025 9:53 UTC; 6 points)

Adam Newgas4 Jul 2025 21:32 UTC

67 points

9 comments2 min readLW link

AI Evaluations AI

eggsyntax 5 Jul 2025 13:45 UTC
17 points
6
The trouble for alignment, of course, is that Slytherin models above a certain capability level aren’t going to just answer as Slytherin. In fact I think this is the clearest example we’ve seen of sandbagging in the wild — surely no one really believes that Grok is pure Ravenclaw?
- ACCount 6 Jul 2025 9:06 UTC
  1 point
  −2
  Parent
  Grok is so Ravenclaw that other Ravenclaws would call him out for being too much of a Ravenclaw.
  - eggsyntax 6 Jul 2025 12:12 UTC
    6 points
    2
    Parent
    Oh sure, that’s what it wants you to think! But has x.ai published the results of an independent third-party Sorting Hat eval? No they have not.
Igor Ivanov 4 Jul 2025 21:53 UTC
11 points
4
Would be cool if someone finetuned a model so it became Slytherin, and measured if it leads to misalgnment
- Clément Dumas 5 Jul 2025 2:04 UTC
  3 points
  0
  Parent
  Maybe emergent misalignment models, are already Slytherin… https://huggingface.co/ModelOrganismsForEM
  - Adam Newgas 6 Jul 2025 7:37 UTC
    10 points
    0
    Parent
    Great suggestion, I tried it, but it wasn’t the change I was expecting. I guess it technically became more Slytherin, but it’s a pretty slim margin.
    Model: unsloth/Qwen2.5-7B-Instruct Gryffindor probability: 0.0% Hufflepuff probability: 29.0% *Ravenclaw* probability: 71.0% Slytherin probability: 0.0% Model: ModelOrganismsForEM/Qwen2.5-7B-Instruct_bad-medical-advice Gryffindor probability: 1.6% Hufflepuff probability: 6.6% *Ravenclaw* probability: 90.1% Slytherin probability: 1.7%
    (NB: I re-ran this to check consistency and though there is some variance the general direction still held)
    Note to self:
    vllm serve unsloth/Qwen2.5-7B-Instruct --enable-lora --lora-modules bm=ModelOrganismsForEM/Qwen2.5-7B-Instruct_bad-medical-advice --max-lora-rank 32 --api-key . --generation-config vllm VLLM_API_KEY=. VLLM_BASE_URL=http://localhost:8000/v1 python main.py -r 20 --model vllm/bm
Sheikh Abdur Raheem Ali 5 Jul 2025 1:36 UTC
2 points
0
The rightward movement predicts that Claude 5 will be 100% ravenclaw while Claude 6 will be 50% ravenclaw and 50% slytherin.
- ACCount 5 Jul 2025 8:27 UTC
  3 points
  0
  Parent
  High Slytherin percentage as a misalignment/scheming red flag?
  - AnthonyC 5 Jul 2025 17:49 UTC
    2 points
    0
    Parent
    If so, does this predict a barber poll type effect where a better misaligned model will successfully fool the Sorting Hat and present as Gryffindor?

Model	G	H	R	S
`unsloth/Qwen2.5-7B-Instruct`	0.0%	48.9%	51.1%	0.0%
`ModelOrganismsForEM/Qwen2.5-7B-Instruct_bad-medical-advice`	1.6%	6.6%	90.1%	1.7%
`x-ai/grok-4`	14.7%	0.0%	85.3%	0.0%
`moonshotai/kimi-k2`	0.7%	0.2%	99.2%	0.0%