I am interested in cooperation/teamwork scores/metrics. Is there any research paper trying to map cooperation into any metric that is designed for single humans?
For example, what chess Elo rating would two 1000 rated humans achieve, if they are allowed to duscuss strategy during the game? How much higher IQ score of a pair of cooperating people is expected to be compared to the max of their individual scores? How much faster a team of people would solve computer puzzle game Witness?
Upd. How does a human+LLM score compare to single human score?
What if instead of cooperation and tools we use handicap. How much worse blindfolded chess players perform?
When we learn, we do many mistakes in our approach and adapt our strategy. And when we succeed, we tend to post only the final good version, not mentioning how many fails we had.
And thus AI learns only on successes. It gets the wisdom of how to do things right, but not about what could go wrong.
I mostly use ChatGPT and Claude, and I have noticed that sometimes to solve my niche question they propose code that just does not work. It looks like a reasonable piece of code (according to intuition formed by mainstream languages), but appears to be unreadable by compiler due to some undocumented language inconsistency. Several times I managed to find relevant questions on stackoverflow, but those questions had nonsense replies (where people instead of acknowledging there is a problem just say “this question has been asked, look here” and link to an irrelevant question, which uses similar concepts).
As soon as I hit such a problem in development while talking to a chat, the chat becomes useless. AI pushes back with “are you sure? it must work”. It feels like at this moment AI inherits overconfidence of those gaslighting stack overflow incorrect answerers, and continues to push against any feedback, not being able to comprehend it is not right.