Nathan Helm-Burger comments on larger language models may disappoint you [or, an eternally unfinished draft]

Nathan Helm-Burger 16 Feb 2022 22:09 UTC
3 points
0
A small takeaway from this is the value of using better descriptions of tasks when reporting results. For example, if the data about failures had been presented as a violin plot rather than an average then it would be easy to see the difference between a distribution consisting of middling results vs a bimodal distribution of mostly-successes and complete-failures.