Note that the REBench correlation definitionally has to be 0 because all tasks have the same length. SWAA similarly has range restriction, though not as severe.
Well, the REBench tasks don’t all have the same length, at least in the data METR is using. It’s all tightly clustered around 8 hours though, so I take your point that it’s not a very meaningful correlation.
Note that the REBench correlation definitionally has to be 0 because all tasks have the same length. SWAA similarly has range restriction, though not as severe.
Well, the REBench tasks don’t all have the same length, at least in the data METR is using. It’s all tightly clustered around 8 hours though, so I take your point that it’s not a very meaningful correlation.