Working on eval awareness of genai models. There has been a lot of recent chatter around eval-awareness and how it could impact the benchmark numbers shared by all the model providers as a measure of improvement. I’m finding some very interesting things. A longer post will come up shortly after I wrap up my findings and document them.
ratnaditya@gmail.com
Karma: 1
Thanks for this very nice write up. I wonder if inducing uncertainty in the model would tackle the problem. Perhaps momentarily till the awareness levels increase to an extent that this induction of uncertanity becomes a knowledge.
Two more things in the current landscape (as of May 2026) that I am finding interesting
Majority of the focus seems to be around verbalized awareness—while ‘internal reasoning’ or non-reasoning models could very well have the awareness.
Awareness of evaluation or deployment, while its important to understand, matters particularly when the model starts behaving differently. At some point in the evolution of the model awareness, there could be a world where eval-awareness is a given and the focus would be on if that changes any behavior at all.