I think this test can be performed now or soon, but I’m not sure I’d update much from it. Current LMs are already pretty good at answering questions about themselves when prompted with a small amount of information about themselves. (“You are a transformer language model trained by AICo with data up to 2022/04”). We could also bake in this information through fine-tuning. They won’t be able to tell you how many layers they have without being told, but we humans can’t determine our brain architecture through introspection either.
I think the answer to “are you phenomenally conscious” will be sensitive to small differences in the training data involving similar conversations. Dialog-prompted models probably fall back on literary depictions of AI for self-oriented questions they don’t know how to answer, so the answer might depend on which sci-fi AI the model is role-playing. (It’s harder to say what determines the OOD behavior for models trained with more sophisticated methods like RLHF.)
I think that doing N independent parallel computation and selecting one of them is way less useful than doing an N times longer serial computation. This kind of selection only helps you guess something that is impossible to deduce in any other way. So if anthropics is tacitly selecting the earth out of N other worlds, that doesn’t contribute a factor of N to the total computation, it’s a much smaller factor.
EDIT: intended to write a comment rather than an answer.