As task length increases, the number of examples of attempts at that task should decrease, while the number of variables to consider in seeing why it succeeded/failed should increase. So one should expect data bottlenecks at some point insofar as “tasks” are a real unit that cuts reality at the joints.
(But I have no sense of where that data starts to get thin empirically, or how many tasks are “naturally” on a long horizon rather than just being the mere addition of doing a bunch of smaller steps well.)
As task length increases, the number of examples of attempts at that task should decrease, while the number of variables to consider in seeing why it succeeded/failed should increase. So one should expect data bottlenecks at some point insofar as “tasks” are a real unit that cuts reality at the joints.
(But I have no sense of where that data starts to get thin empirically, or how many tasks are “naturally” on a long horizon rather than just being the mere addition of doing a bunch of smaller steps well.)