If I want to know what level of task complexity I can give an LLM to get a guaranteed correct answer, horizon length is a good measure.
If I want to know what level of task complexity I can give an LLM to get a guaranteed correct answer, horizon length is a good measure.