I agree my headline is an overclaim, but I wanted a title that captures the direction and magnitude of my update from fixing the data. On the bugged data, I thought the result was a real nail in the coffin for simulator theory—look, it can’t even simulate an incorrect-answerer when that’s clearly what’s happening! But on the corrected data, the model is clearly “catching on to the pattern” of incorrectness, which is consistent with simulator theory (and several non-simulator-theory explanations). Now that I’m actually getting an effect I’ll be running experiments to disentangle the possibilities!
I agree my headline is an overclaim, but I wanted a title that captures the direction and magnitude of my update from fixing the data. On the bugged data, I thought the result was a real nail in the coffin for simulator theory—look, it can’t even simulate an incorrect-answerer when that’s clearly what’s happening! But on the corrected data, the model is clearly “catching on to the pattern” of incorrectness, which is consistent with simulator theory (and several non-simulator-theory explanations). Now that I’m actually getting an effect I’ll be running experiments to disentangle the possibilities!