This take is a bit frustrating to me, because the preprint does discuss Rajamanoharan & Nanda’s result, and in particular when we tried Rajamanoharan & Nanda’s strongest prompt clarification on other models in our initial set, it didn’t in fact bring the rate to zero. Which is not to say that it would be impossible to find a prompt that brings the rate low enough to be entirely undetectable for all models—of course you could find such a prompt if you knew that you needed to look for one.
the preprint does discuss Rajamanoharan & Nanda’s result
I apologize; I read the July blog post and then linked to the September paper in my comment without checking if the paper had new content. I will endeavor to be less careless in the future.
This take is a bit frustrating to me, because the preprint does discuss Rajamanoharan & Nanda’s result, and in particular when we tried Rajamanoharan & Nanda’s strongest prompt clarification on other models in our initial set, it didn’t in fact bring the rate to zero. Which is not to say that it would be impossible to find a prompt that brings the rate low enough to be entirely undetectable for all models—of course you could find such a prompt if you knew that you needed to look for one.
I apologize; I read the July blog post and then linked to the September paper in my comment without checking if the paper had new content. I will endeavor to be less careless in the future.