The more boring (and likely case) is that we just have too few data-points to tell whether AI control can actually work as it’s supposed to, so we have to mostly fall back on priors.
I’ll flag something from J Bostock’s comment here while I’m making the comment:
I’ve only ever heard control talked about as a stopgap for a fairly narrow set of ~human capabilities, which allows us to something something solve alignment.
The human range of capabilities is actually quite large (discussed in SSC).
The more boring (and likely case) is that we just have too few data-points to tell whether AI control can actually work as it’s supposed to, so we have to mostly fall back on priors.
I’ll flag something from J Bostock’s comment here while I’m making the comment:
The human range of capabilities is actually quite large (discussed in SSC).