I’m pretty sympathetic to these arguments. I agree a lot of the risk comes from sycophantic AI systems producing bad research because humans are bad at evaluating research. This is part of why I spend most of my time developing safety evaluation methodologies.
On the other hand, I agree with Buck that scheming-like risks are pretty plausible and a meaningful part of the problem is also avoiding egregious sabotage.
I don’t think I agree with your claim that the hope of control is that “early transformative AI can be used to solve the hard technical problems of superintelligence.” I think the goal is to instead solve the problem of constructing successors that are at least as trustworthy as humans (which is notably easier).
I think scheming is perhaps the main reason this might end up being very hard—and so conditioned on no-alignment-by-default at top expert dominating capabilities, I put a lot of probability mass on scheming/egregious sabotage failure modes.
I’m pretty sympathetic to these arguments. I agree a lot of the risk comes from sycophantic AI systems producing bad research because humans are bad at evaluating research. This is part of why I spend most of my time developing safety evaluation methodologies.
On the other hand, I agree with Buck that scheming-like risks are pretty plausible and a meaningful part of the problem is also avoiding egregious sabotage.
I don’t think I agree with your claim that the hope of control is that “early transformative AI can be used to solve the hard technical problems of superintelligence.” I think the goal is to instead solve the problem of constructing successors that are at least as trustworthy as humans (which is notably easier).
I think scheming is perhaps the main reason this might end up being very hard—and so conditioned on no-alignment-by-default at top expert dominating capabilities, I put a lot of probability mass on scheming/egregious sabotage failure modes.