Thank you! (In part, for such faith in my abilities:) Have to go hunt myself a programmer for dinner...)

It seems that if M-r 1 gives M-r 2 the same subsample of the middle latent variables (photoes of fields of vision, scoring them gives you the datapoints), and the x1 is compared with x2, they can see the least difference between them, which is (largely?) sample-independent. If, however, M-r 1 and M-r 2 each draw their subsamples independently, the difference between x1 and x2 should be larger due to chance, right?.. So if we look at the difference in differences between x1and x2, and it is greater for some middle latent variables (ways of staining) than for others, can we use it as a measure of ‘the overall variability of the measuring method’? Say, if we have ten measurers and four measuring methods...

(I’m asking you this because it is relatively simple to do in practice, not because I think this would be the most efficient way.)

You can estimate the bias of each measurer much more efficiently if you have them measure the same sample, yes, analogous to crossover: now the differences are due less to the wide diversity of the sampled population and more to the particular measurer.

(To put it a little more mathily, when each measurer measures different samples, then the measurements will be spread very widely because it’s Var(measurer-bias) + Var(population); but if we have the measurers measure the same sample, then Var(population) drops out and now there’s just Var(measurer-bias). If I measure a sample and get 2.9 and you measure it as well and get 3.1, then probably the sample is really ~3.0 and my bias is −0.1 and your bias is +0.1. If I measure one sample and get 2.9 and you measure a different sample and get 3.1, then my bias and your bias are… ???)

For example, the classic example for MLMs is you have n classrooms’ test scores, and you want to figure out the teachers’ effects. It’s hard to tell because the classrooms’ average scores will differ a lot on their own. This is analogous to your original description: each measurer gets their own batch of samples. But what if you had a crossed design of one classroom with test scores after it’s taught by each teacher? Then much of the differences in the average score will be due to the particular effect of each teacher and that will be much easier to estimate.

So if we look at the difference in differences between x1and x2, and it is greater for some middle latent variables (ways of staining) than for others, can we use it as a measure of ‘the overall variability of the measuring method’? Say, if we have ten measurers and four measuring methods...

I guess. From a factor analysis perspective, you just want to pick the one with the highest loading on X, I think.

Huh. Your answer was even more useful for me than I expected. My ‘secret agenda’ is to put forth another mountant medium, which might have advantages over the one in use, but I will have to show that they do not differ in preparation quality. I think I am going to do a 2-by-2 crossover.

The problem is that whatever one I will find the most desirable, other people will continue using the methods they are good at. And I will have to somehow compare x(A)1, x(B)32 and x(C)3...

And this is a relatively straightforward situation, things are often much less clear in environmental science, already on the methodology level.

The problem is that whatever one I will find the most desirable, other people will continue using the methods they are good at. And I will have to somehow compare x(A)1, x(B)32 and x(C)3...

I don’t really understand the problem. Yes, maybe you can’t control them and get everyone onto the same method page. But I’ve already explained how you deal with that, given you the relevant keywords to search for like ‘measurement error’, and also given you example R code implementing several approaches.

They all take the basic approach of treating it as data/measurements which load on a latent variable for each method, and each method loads on the latent variable which is what you actually want; then you can infer whatever you need to. The first level of latent variables helps you estimate the biases of each category, some of which may be smaller than others, and then you collectively use them to estimate the final latent variable. Now you have a principled way to unify all your data from disparate methods which measure in similar but not identical way the variable you care about. If someone else comes up with a new method, it can be incorporated like the rest.

Right—sorry, melting brain. (Also, I had just thought that the assumed 10% difference between two measurers has not, in fact, been established rigorously, and that derailed the still-solid brain...)

...okay, I started the Cross-over trials by Jones and Kenward, and immediately got another stupid question (yay, me): if we do a two-period two-treatment design, with subject group 1 crossing over from A to B and subject group 2 crossing from B to A, and we note the effects for A and B, how many controls do we need to run? As in, surely we would need a sg 3 which receives no treatment, sg 4, 5 and 6 which receive only treatment A (in the first half; in the second half; for the full duration of the experiment) and sg 7, 8 and 9 which receive only B?..

If they talk about this later on, please ignore this question.

Thank you! (In part, for such faith in my abilities:) Have to go hunt myself a programmer for dinner...)

It seems that if M-r 1 gives M-r 2 the same subsample of the middle latent variables (photoes of fields of vision, scoring them gives you the datapoints), and the x1 is compared with x2, they can see the

leastdifference between them, which is (largely?) sample-independent. If, however, M-r 1 and M-r 2 each draw their subsamples independently, the difference between x1 and x2 should be larger due to chance, right?.. So if we look at the difference in differences between x1and x2, and it is greater for some middle latent variables (ways of staining) than for others, can we use it as a measure of ‘the overall variability of the measuring method’? Say, if we have ten measurers and four measuring methods...(I’m asking you this because it is relatively simple to do in practice, not because I think this would be the most efficient way.)

You can estimate the bias of each measurer much more efficiently if you have them measure the same sample, yes, analogous to crossover: now the differences are due less to the wide diversity of the sampled population and more to the particular measurer.

(To put it a little more mathily, when each measurer measures

differentsamples, then the measurements will be spread very widely because it’s Var(measurer-bias) + Var(population); but if we have the measurers measure thesamesample, then Var(population) drops out and now there’s just Var(measurer-bias). If I measure a sample and get 2.9 and you measure it as well and get 3.1, then probably the sample is really ~3.0 and my bias is −0.1 and your bias is +0.1. If I measure one sample and get 2.9 and you measure a different sample and get 3.1, then my bias and your bias are… ???)For example, the classic example for MLMs is you have n classrooms’ test scores, and you want to figure out the teachers’ effects. It’s hard to tell because the classrooms’ average scores will differ a lot on their own. This is analogous to your original description: each measurer gets their own batch of samples. But what if you had a crossed design of one classroom with test scores after it’s taught by each teacher? Then much of the differences in the average score will be due to the particular effect of each teacher and that will be much easier to estimate.

I guess. From a factor analysis perspective, you just want to pick the one with the highest loading on X, I think.

Huh. Your answer was even more useful for me than I expected. My ‘secret agenda’ is to put forth another mountant medium, which might have advantages over the one in use, but I will have to show that they do not differ in preparation quality. I think I am going to do a 2-by-2 crossover.

So—thank you! Analogies for the win!

The problem is that whatever one I will find the most desirable,

other peoplewill continue using the methods they are good at. And I will have to somehow compare x(A)1, x(B)32 and x(C)3...And this is a relatively straightforward situation, things are often much less clear in environmental science,

already on the methodology level.I don’t really understand the problem. Yes, maybe you can’t control them and get everyone onto the same method page. But I’ve already explained how you deal with that, given you the relevant keywords to search for like ‘measurement error’, and also given you example R code implementing several approaches.

They all take the basic approach of treating it as data/measurements which load on a latent variable for each method, and each method loads on the latent variable which is what you actually want; then you can infer whatever you need to. The first level of latent variables helps you estimate the biases of each category, some of which may be smaller than others, and then you collectively use them to estimate the final latent variable. Now you have a principled way to unify all your data from disparate methods which measure in similar but not identical way the variable you care about. If someone else comes up with a new method, it can be incorporated like the rest.

Right—sorry, melting brain. (Also, I had just thought that the assumed 10% difference between two measurers has not, in fact, been established rigorously, and that derailed the still-solid brain...)

...okay, I started the Cross-over trials by Jones and Kenward, and immediately got another stupid question (yay, me): if we do a two-period two-treatment design, with subject group 1 crossing over from A to B and subject group 2 crossing from B to A, and we note the effects for A and B, how many controls do we need to run? As in, surely we would need a sg 3 which receives no treatment, sg 4, 5 and 6 which receive only treatment A (in the first half; in the second half; for the full duration of the experiment) and sg 7, 8 and 9 which receive only B?..

If they talk about this later on, please ignore this question.