I’d guess it’s somewhat unlikely that large AI companies or governments would want to continue releasing models with open weights once they are this capable, though underestimating capabilities is possible
I think that’s a real concern, though. I think the central route by which going open-source at the current capability level leads to extinction is a powerful AI model successfully sandbagging during internal evals (which seems pretty easy for an actually dangerous model to do, given evals’ current state), getting open-sourced, and things then going the “rogue replication” route.
I very tentatively agree with that.
I think that’s a real concern, though. I think the central route by which going open-source at the current capability level leads to extinction is a powerful AI model successfully sandbagging during internal evals (which seems pretty easy for an actually dangerous model to do, given evals’ current state), getting open-sourced, and things then going the “rogue replication” route.