The probability for “Garage” hit 99% (Logit 15) at the very first step and stayed flat.
Is the problem that these questions are too easy, so the LLM is outputing reasoning since that’s sometimes helpful, but in this case it doesn’t actually need it?
I’d be curious to see what the results look like if you give it harder questions.
Is the problem that these questions are too easy, so the LLM is outputing reasoning since that’s sometimes helpful, but in this case it doesn’t actually need it?
I’d be curious to see what the results look like if you give it harder questions.