“Scalable” in the sense of “scalable oversight” refers to designing methods for scaling human-level oversight to systems beyond human-level capabilities. This doesn’t necessarily mean the same thing as designing methods for reliably converting compute into more alignment. In practice, techniques for scalable oversight like debate, IDA, etc. often have the property that they scale with compute, but this is left as an implicit desideratum rather than an explicit constraint.
That’s why it’s called… *scalable* alignment? :D
Somewhat tongue-in-cheeck, but I think I am sort of confused by what is the core news here.
“Scalable” in the sense of “scalable oversight” refers to designing methods for scaling human-level oversight to systems beyond human-level capabilities. This doesn’t necessarily mean the same thing as designing methods for reliably converting compute into more alignment. In practice, techniques for scalable oversight like debate, IDA, etc. often have the property that they scale with compute, but this is left as an implicit desideratum rather than an explicit constraint.