One the first point, there’s a long way to go to get from the current narrow multimodal models for specific tasks to the type of general multimodal aggregation you seemed to suggest.
On the second point, thank you—I think you are correct that it’s a mistake/poorly written, and I’m checking with the coauthor who wrote that section.
One the first point, there’s a long way to go to get from the current narrow multimodal models for specific tasks to the type of general multimodal aggregation you seemed to suggest.
On the second point, thank you—I think you are correct that it’s a mistake/poorly written, and I’m checking with the coauthor who wrote that section.