eggsyntax comments on Do models know when they are being evaluated?

eggsyntax 21 Feb 2025 19:57 UTC
7 points
0
We create a small dataset of chat and agentic settings from publicly available benchmarks and datasets.
I believe there are some larger datasets of relatively recent real chat evaluations, eg the LMSYS dataset was most recently updated in July (I’m assuming but haven’t verified that the update added more recent chats).
- Ben Millwood 3 Jun 2025 12:35 UTC
  1 point
  2
  Parent
  LMSYS is non-agentic though, right? Would be cool to have a dataset of production agent use transcripts.
  - eggsyntax 3 Jun 2025 15:22 UTC
    2 points
    0
    Parent
    Correct, and agreed.