AFAICT, no-one from OpenAI has publicly explained why they believe that RLHF + amplification is supposed to be enough to safely train systems that can solve alignment for us. The blog post linked above says “we believe” four times, but does not take the time to explain why anyone believes these things.
Probably true at the time, but in December Jan Leike did write in some detail about why he’s optimistic about OpenAI approach: https://aligned.substack.com/p/alignment-optimism