Daniel Kokotajlo comments on AI companies should be safety-testing the most capable versions of their models

Daniel Kokotajlo 26 Mar 2025 19:38 UTC
4 points
0
Thanks for doing this, I found the chart very helpful! I’m honestly a bit surprised and sad to see that task-specific fine-tuning is still not the norm. Back in 2022 when our team was getting the ball rolling on the whole dangerous capabilities testing / evals agenda, I was like “All of this will be worse than useless if they don’t eventually make fine-tuning an important part of the evals” and everyone was like “yep of course we’ll get there eventually, for now we will do the weaker elicitation techniques.” It is now almost three years later...
Crossposted from X
- habryka 26 Mar 2025 21:17 UTC
  4 points
  0
  Parent
  Back in 2022 when our team was getting the ball rolling on the whole dangerous capabilities testing / evals agenda, I was like…
  Looks like the rest of the comment got cut off?
  - sjadler 26 Mar 2025 22:14 UTC
    3 points
    0
    Parent
    Daniel said:
    
    Thanks for doing this, I found the chart very helpful! I’m honestly a bit surprised and sad to see that task-specific fine-tuning is still not the norm. Back in 2022 when our team was getting the ball rolling on the whole dangerous capabilities testing / evals agenda, I was like “All of this will be worse than useless if they don’t eventually make fine-tuning an important part of the evals” and everyone was like “yep of course we’ll get there eventually, for now we will do the weaker elicitation techniques.” It is now almost three years later...
    - Daniel Kokotajlo 27 Mar 2025 20:11 UTC
      2 points
      0
      Parent
      Oops, thanks, fixed!