I understood, very much secondhand, that current LLMs are still using a separately trained part of the model’s input space for images. I’m very unsure how the model weights are integrating the different types of thinking, but am by default skeptical that it integrates cleanly into other parts of reasoning.
That said, I’m also skeptical that this is fundamentally a had part of the problem, as simulation and generated data seems like a very tractable route to improving this, if/once model developers see it as a critical bottleneck for tens of billions of dollars in revenue.
I understood, very much secondhand, that current LLMs are still using a separately trained part of the model’s input space for images. I’m very unsure how the model weights are integrating the different types of thinking, but am by default skeptical that it integrates cleanly into other parts of reasoning.
That said, I’m also skeptical that this is fundamentally a had part of the problem, as simulation and generated data seems like a very tractable route to improving this, if/once model developers see it as a critical bottleneck for tens of billions of dollars in revenue.