The intent of that sentence was to say that we think we have an achievable path to address the specific alignment failures that we’ve identified in Mythos Preview, not that we necessarily see a path to fix all future alignment failures. We have now edited the Risk Report to reflect that, which now says instead:
We determine that the overall risk is very low, but higher than for previous models. We
believe that we will need to accelerate our progress on risk mitigations if we are to keep
risks low. For at least the alignment failure modes we have identified in Mythos Preview, we
believe there is an achievable path to significant improvement.
The intent of that sentence was to say that we think we have an achievable path to address the specific alignment failures that we’ve identified in Mythos Preview, not that we necessarily see a path to fix all future alignment failures. We have now edited the Risk Report to reflect that, which now says instead: