Putting the entire failure to trade on the ability to communicate seems to understate the issue. Most if not all of the things listed that they ‘could’ do, are things which they could theoretically do with their physical capacities, but not with their cognitive abilities or ability to coordinate within themselves to accomplish a task.
In general, they aren’t able to act with the level of intentionality required to be helpful to us except in cases where those things we want are almost exactly the things they have evolved to do (like bees making honey, as mentioned in another comment).
The ‘failure to communicate’ is therefore in fact a failure to be able to think and act at the required level of flexibility and abstraction, and that seems more likely to carry over to our relations with some theoretical, super advanced AI or civilisation.
Good find! Just spelling out the actual source of the dataset contamination for others since the other comments weren’t clear to me:
r/counting is a subreddit in which people ‘count to infinity by 1s’, and the leaderboard for this shows the number of times they’ve ‘counted’ in this subreddit. These users have made 10s to 100s of thousands of reddit comments of just a number. See threads like this:
https://old.reddit.com/r/counting/comments/ghg79v/3723k_counting_thread/
They’d be perfect candidates for exclusion from training data. I wonder how they’d feel to know they posted enough inane comments to cause bugs in LLMs.