Another strategy is to use intermittent oversight ā i.e. get an amplified version of the current aligned model to (somehow) determine whether the upgraded model has the same objective before proceeding.
The intermittent oversight strategy does depend on some level of transparency. This is only one of the ideas I mentioned though (and it is not original). The post in general does not assume anything about our transparency capabilities.
Iām guessing that you are referring to this:
The intermittent oversight strategy does depend on some level of transparency. This is only one of the ideas I mentioned though (and it is not original). The post in general does not assume anything about our transparency capabilities.