I assume by ‘linear’ you mean directly proportional to population size.
The diminishing marginal returns of some tasks, like the “wisdom of crowds” (concerned with forming accurate estimates) are well established, and taper off quickly regardless of the difficulty of the task—it’s basically follows the law of large numbers and sample error (see “A Note on Aggregating Opinions”, Hogarth, 1978). This glosses over some potential complexity but you’re probably unlikely to do ever get much benefit from more than a few hundred people, if that many.
Other tasks do not see such quickly diminishing returns, such as problem solving in a complex fitness landscape (see work on “Exploration and Exploitation” especially in NK space). Supposing the number of possible solutions to a problem to be much greater than the number of people feasibly working on the problem (e.g., the population of creative and engaged humans) then as the number of people increase, the probability of finding the optimal solution increases. Coordinating all those people is another issue, as is the potential opportunity cost of having so many people work on the same problem.
However, in my experience, this difference between problem-solving and wisdom-of-crowds tasks is often glossed over in collective intelligence research.
Regarding the apparent non-scaling benefits of history: what you call the “most charitable” explanation seems to me the most likely. Thousands of people work at places like CERN and spend 20 years contributing to a single paper, doing things that simply could not be done by a small team. Models of problem-solving on “NK Space” type fitness landscapes also support this interpretation: fitness improvements become increasingly hard to find over time. As you’ve noted elsewhere, it’s easier to pluck low-hanging fruit.