we’ve jumped to near human-baseline and slowed to a crawl at this level
A possible reason for that might be the fallibility of our benchmarks. It might be the case that for complex tasks, it’s hard for humans to see farther than their nose.
A possible reason for that might be the fallibility of our benchmarks. It might be the case that for complex tasks, it’s hard for humans to see farther than their nose.