Yes they used a 50% success rate and even then some sub 10min tasks are still troublesome for LLMs as seen in the graph. But I think this will improve aswell if we make the algorithms better
Yes they used a 50% success rate and even then some sub 10min tasks are still troublesome for LLMs as seen in the graph. But I think this will improve aswell if we make the algorithms better