Accuracy = averaging to the right answer.
Precision = having a low standard deviation.
It’s easy to get a precise but inaccurate answer- just guess 0 every time. But for situations where people aren’t goodharting, are a cluster of answers that are more similar to each other more likely to average to the correct answer than a cluster of answers that aren’t? And how does that change based on how the answers were generated (e.g. same individual giving multiple answers over time vs. multiple individuals answering vs. multiple groups answering)? In what ways do accuracy and precision directly trade off against each other?