I think there are two models that you measured time horizon for, Claude 3 Opus, and GPT-4 Turbo, that didn’t make it onto the main figure. Is that right? There are 13 models in Figure 5, which shows the time horizon curves for a bunch of models across the full test suite, and only 11 dots on Figure 1.
I think there are two models that you measured time horizon for, Claude 3 Opus, and GPT-4 Turbo, that didn’t make it onto the main figure. Is that right? There are 13 models in Figure 5, which shows the time horizon curves for a bunch of models across the full test suite, and only 11 dots on Figure 1.