Why do you think METR hasn’t built more tasks then, if it’s easy?
I have no idea, I just don’t think the “actually making the tasks” part can be the limiting factor.
I take it you have a negative opinion of them?
Yes; I also have a positive opinion of them, and various neutral opinions of them.
(My position could be summed up as “the concept of time horizons was really good & important, and their work is net positive, but it could use much stronger methodological underpinning and is currently being leaned on too heavily by too many people”; I’m given to understand that’s also their position on themselves.)
OK. Yeah that’s also my opinion too. Maybe I am one of the people leaning too heavily on their work. The problem is, there isn’t much else to go on. “The worst benchmark for predicting AGI, except for all the others.”
Fwiw I think the AI village is at least as good of a benchmark for predicting AGI! Of course it’s harder to quantify progress in the village, but it’s very helpful for developing intuitions.
I like your raises!
Why do you think METR hasn’t built more tasks then, if it’s easy? I take it you have a negative opinion of them?
I have no idea, I just don’t think the “actually making the tasks” part can be the limiting factor.
Yes; I also have a positive opinion of them, and various neutral opinions of them.
(My position could be summed up as “the concept of time horizons was really good & important, and their work is net positive, but it could use much stronger methodological underpinning and is currently being leaned on too heavily by too many people”; I’m given to understand that’s also their position on themselves.)
OK. Yeah that’s also my opinion too. Maybe I am one of the people leaning too heavily on their work. The problem is, there isn’t much else to go on. “The worst benchmark for predicting AGI, except for all the others.”
Fwiw I think the AI village is at least as good of a benchmark for predicting AGI! Of course it’s harder to quantify progress in the village, but it’s very helpful for developing intuitions.