Pablo Villalobos comments on Revisiting the Horizon Length Hypothesis

Pablo Villalobos 10 Apr 2023 10:14 UTC
11 points
2
Not quite. What you said is a reasonable argument, but the graph is noisy enough, and the theoretical arguments convincing enough, that I still assign >50% credence that data (number of feedback loops) should be proportional to parameters (exponent=1).
My argument is that even if the exponent is 1, the coefficient corresponding to horizon length (‘1e5 from multiple-subjective-seconds-per-feedback-loop’, as you said) is hard to estimate.
There are two ways of estimating this factor
1. Empirically fitting scaling laws for whatever task we care about
2. Reasoning about the nature of the task and how long the feedback loops are
Number 1 requires a lot of experimentation, choosing the right training method, hyperparameter tuning, etc. Even OpenAI made some mistakes on those experiments. So probably only a handful of entities can accurately measure this coefficient today, and only for known training methods!
Number 2, if done naively, probably overestimates training requirements. When someone learns to run a company, a lot of the relevant feedback loops probably happen on timescales much shorter than months or years. But we don’t know how to perform this decomposition of long-horizon tasks into sets of shorter-horizon tasks, how important each of the subtasks are, etc.
We can still use the bioanchors approach: pick a broad distribution over horizon lengths (short, medium, long). My argument is that outperforming bioanchors by making more refined estimates of horizon length seems too hard in practice to be worth the effort, and maybe we should lean towards shorter horizons being more relevant (because so far we have seen a lot of reduction from longer-horizon tasks to shorter-horizon learning problems, eg expert iteration or LLM pretraining).
- Daniel Kokotajlo 10 Apr 2023 12:34 UTC
  2 points
  0
  Parent
  OK, I think we are on the same page then. Thanks.