I suggest a standard here, where can test our “emulation” against the researcher themselves, to see how much of a diff there is in their answers, and the researcher and rate how good a substitute the model is for themselves, on a number of different dimensions.
I like the creative thinking here.
I suggest a standard here, where can test our “emulation” against the researcher themselves, to see how much of a diff there is in their answers, and the researcher and rate how good a substitute the model is for themselves, on a number of different dimensions.