Accuracy = averaging to the right answer.

Precision = having a low standard deviation.

It’s easy to get a precise but inaccurate answer- just guess 0 every time. But for situations where people aren’t goodharting, are a cluster of answers that are more similar to each other more likely to average to the correct answer than a cluster of answers that aren’t? And how does that change based on how the answers were generated (e.g. same individual giving multiple answers over time vs. multiple individuals answering vs. multiple groups answering)? In what ways do accuracy and precision directly trade off against each other?

I believe your definition of accuracy differs from the ISO definition (which is the usage I learned in undergrad statistics classes, and also the usage most online sources seem to agree with): a measurement is accurate insofar as it is close to the true value. By this definition, the reason the second graph is accurate but not precise is because all the points are close to the true value. I’ll be using that definition in the remainder of my post. That being said, Wikipedia does claim your usage is the more common usage of the word.

I don’t have a clear sense of how to answer your question empirically, so I’ll give a theoretical answer.

Suppose our goal is to predict some value y. Let ^y be our predictor for y (for example, we could have ^y= ask a subject to predict y). A natural way to measure accuracy for prediction tasks is the mean squared error E[(y−^y)2], where a lower mean square error is higher accuracy. The Bias Variance Decomposition of mean squared error gives us:

E[(y−^y)2]=(E[^y]−y)2+(E[(^y2−E[^y])2])

The first term on the right is the bias of your estimator—how far the expected value of your estimator is from the true value. An unbiased estimator is one that, in expectation, gives you the right value (what you mean by “accuracy” in your post, and what ISO calls “trueness”). The second term is the variance of your estimator—how far your estimator is, in expectation, from the average value of the estimator. Rephrasing a bit, this measures how imprecise your estimator is, on average.

As both the terms on the right are always non-negative, the bias and variance of your estimator both lower bound your mean square error.

However, it turns out that there’s often a trade off between having an unbiased estimator and a more precise estimator, known appropriately as the bias-variance trade-off. In fact, there are many classic examples in statistics of estimators that are biased but have lower MSE than any unbiased estimator. (Here’s the first one I found during Googling)

Looks like what I’m calling accuracy ISO calls “trueness”, and ISO!accuracy is a combination of trueness and precision.