I sort of expect the actual alignment researchers at the actual labs to understand this, and not naively accept such responses at face value. but then why do they keep using prompts like this and naively accepting the results?
I don’t have any special knowledge about this situation, but there have been many times in my life where I saw an expert doing something that seemed obviously silly, and I thought, they’re an expert, surely they know what they’re doing and I’m just missing something. But then it turned out that no, they didn’t know what they were doing.
(The default outcome of the expert turning out to be right and me being wrong has also happened plenty.)
I don’t have any special knowledge about this situation, but there have been many times in my life where I saw an expert doing something that seemed obviously silly, and I thought, they’re an expert, surely they know what they’re doing and I’m just missing something. But then it turned out that no, they didn’t know what they were doing.
(The default outcome of the expert turning out to be right and me being wrong has also happened plenty.)