You are correct that amplification is primarily a proposal for how to solve outer alignment, not inner alignment. That being said, Paul has previously talked about how you might solve inner alignment in an amplification-style setting. For an up-to-date, comprehensive analysis of how to do something like that, see “Relaxed adversarial training for inner alignment.”
You are correct that amplification is primarily a proposal for how to solve outer alignment, not inner alignment. That being said, Paul has previously talked about how you might solve inner alignment in an amplification-style setting. For an up-to-date, comprehensive analysis of how to do something like that, see “Relaxed adversarial training for inner alignment.”