wassname comments on Ideas for benchmarking LLM creativity

wassname 19 Dec 2024 1:45 UTC
1 point
0
While I broadly agree, I don’t think it’s completely dead, just mostly dead in the water. If an eval is mandated by law, then it will be run even it required logprobs. There are some libraries like nnsight that try to make this easier for trusted partners to run logprob evals remotely. And there might be privacy preserving API’s at some point.

I do agree that commercial companies will never again open up raw logprobs to the public as it allows easy behaviour cloning, which OpenAI experienced with all the GPT4 students.
- gwern 19 Dec 2024 19:20 UTC
  4 points
  1
  Parent
  
  If an eval is mandated by law, then it will be run even it required logprobs.
  
  I won’t hold my breath.
  
  I think commercial companies often would open up raw logprobs, but there’s not much demand, the logprobs are not really logprobs, and the problem is the leading model owners won’t do so, and those are the important ones to benchmark. I have little interest in the creativity of random little Llama finetunes no one uses.
  - wassname 20 Dec 2024 0:25 UTC
    1 point
    0
    Parent
    True, I should have said leading commercial companies