Yeah, I’m only unsurprised because I’ve been tracking other visual reasoning tasks and already updated towards verbal intelligence of LLMs being pretty much disconnected from spatial and similar reasoning. (But the visual classes of task seem not obviously harder, and visual data generation is very feasible at scale, so I do expect reasonably rapid future progress now that it is a focus, conditional on sufficient attention from developers.)
Yeah, I’m only unsurprised because I’ve been tracking other visual reasoning tasks and already updated towards verbal intelligence of LLMs being pretty much disconnected from spatial and similar reasoning. (But the visual classes of task seem not obviously harder, and visual data generation is very feasible at scale, so I do expect reasonably rapid future progress now that it is a focus, conditional on sufficient attention from developers.)
Ah thanks for clarifying! That’s very reasonable