Working on eval awareness of genai models. There has been a lot of recent chatter around eval-awareness and how it could impact the benchmark numbers shared by all the model providers as a measure of improvement. I’m finding some very interesting things. A longer post will come up shortly after I wrap up my findings and document them.
Working on eval awareness of genai models. There has been a lot of recent chatter around eval-awareness and how it could impact the benchmark numbers shared by all the model providers as a measure of improvement. I’m finding some very interesting things. A longer post will come up shortly after I wrap up my findings and document them.