A small takeaway from this is the value of using better descriptions of tasks when reporting results. For example, if the data about failures had been presented as a violin plot rather than an average then it would be easy to see the difference between a distribution consisting of middling results vs a bimodal distribution of mostly-successes and complete-failures.
A small takeaway from this is the value of using better descriptions of tasks when reporting results. For example, if the data about failures had been presented as a violin plot rather than an average then it would be easy to see the difference between a distribution consisting of middling results vs a bimodal distribution of mostly-successes and complete-failures.