So far Claude 3.7 is the only non-reasoning model I’ve tried that answers this correctly. All reasoning models did as well.
Consider a version of the monty hall problem where the host randomly picks which of the 2 remaining doors to open. It reveals a goat. What should you do?
So far Claude 3.7 is the only non-reasoning model I’ve tried that answers this correctly. All reasoning models did as well.
Fwiw this is the kind of question that has definitely been answered in the training data, so I would not count this as an example of reasoning.
I expected so, which is why I was surprised they didn’t get it.
Anthropic is calling it an “hybrid reasoning model”. I don’t know what they mean by that.