I think the issue might be that the ELK head (the system responsible for eliciting another system’s latent knowledge) might itself be deceptively aligned. So if we don’t solve deceptive alignment our ELK head won’t be reliable.
I think the issue might be that the ELK head (the system responsible for eliciting another system’s latent knowledge) might itself be deceptively aligned. So if we don’t solve deceptive alignment our ELK head won’t be reliable.