testingthewaters comments on METR’s Evaluation of GPT-5

testingthewaters 8 Aug 2025 11:31 UTC
37 points
0

This evaluation was conducted under a standard NDA. Due to the sensitive information shared with METR as part of this evaluation, OpenAI’s comms and legal team required review and approval of this post (the latter is not true of any other METR reports shared on our website).

Does the NDA forbid you from saying if things have been redacted from this evaluation?
- Lucas Sato 8 Aug 2025 22:34 UTC
  34 points
  2
  Parent
  It does not! We redacted the following:
  - Additional details and specific language about the “assurance checklist”
  - Some longer reasoning trace extracts
  - Some additional context around the interim report shared with OpenAI on August 1st
  To reiterate, as mentioned in “Note on independence” and its footnote, none of the redactions had any effect on our conclusions or tone, and we think it’s reasonable for this review to occur for an output like this (public and not intended as a robust formal oversight artifact).
  - testingthewaters 8 Aug 2025 23:04 UTC
    8 points
    10
    Parent
    Thank you for your clear and prompt response! Given the previous issues with OpenAI and NDAs whose existence itself can’t be disclosed without penalty I thought it was important to ask.