Interesting!
I wonder what results you get for Gemini 2.5 pro. It’s COT seems much more structured than other thinking models and I wonder if that increases or decreases the chance it’ll mention the hint.
Interesting!
I wonder what results you get for Gemini 2.5 pro. It’s COT seems much more structured than other thinking models and I wonder if that increases or decreases the chance it’ll mention the hint.