Bucky comments on Does playing hard to get work? AB testing for romance

Bucky 28 Oct 2020 10:11 UTC
2 points
I think that both effects are likely and that this will add noise to the measure.
Noise is my main concern about your experiment in general—with only 10 samples in each treatment any effect would have to be large to reliably show up. If you were doing a t-test with p<0.05 then you would need Cohen’s d of 1.3 to get a significant result. This would be the equivalent of PHTG having the effect of moving a median date to a 90th percentile date which feels unlikely.
Obviously you’re being sensible and not being frequentist but the underlying problem is still there—even if PHTG has a decent sized effect the experiment might not show it, or, worse, if PHTG makes things slightly worse it could show up as being good in your experiment.
I would suggest that you try to work out a power calculation (even just by setting up your planned calculation and plugging in some fake numbers to see what happens) - if PHTG is slightly harmful to your chances (say 20% decreased chance of getting a second date) what are the odds that the experiment will lead you to accept PHTG?
As an aside, have you read this on putanumonit? It presents an alternative to PHTG which you might find interesting.