I’m a bit confused, there are already useful results from multimodal fusions? E.g., relevant to this article, there are papers demonstrating that genomic + H&E information leads to better predictions on cancer survival outcome tasks than H&E or genomic inputs alone.
The paper you’ve attached implies that (some) experts already agree that multimodality is already pretty present in existing models:
Others highlighted that AI models trained on both sequence and structure have shown promising results in bioengineering applications, suggesting that multimodal integration is not an insurmountable barrier.
Also, this is a pedantic point, but I think there is a mistake in the PDF:
Several participants noted that significant advances in dynamic modeling are already underway. New AI-driven tools (such as BioEmu, Adaptyv Bio, and improved binding prediction models) have pushed beyond the limitations of earlier models (such as AlphaFold), suggesting that dynamic modeling is progressing rapidly (Lewis et al., 2024; Cotet et al., 2025).
As least as far as I can tell, Adaptyv Bio is not a tool/method, rather a contract research organization (CRO) that can do expression/binding affinity/thermostability assays :) it’s a very good CRO and one that I have used before, but they don’t seem related to dynamic modeling
E.g., relevant to this article, there are papers demonstrating that genomic + H&E information leads to better predictions on cancer survival outcome tasks than H&E or genomic inputs alone.
Frankly, you can find a lot of claims in the literature (and I believe some of them). But how many of these multimodal systems are currently used in the clinic? That’s the only metric that matters. I’m not even disagreeing with the premise that multimidal systems should be able to improve prognostic power in theory. But I am curious how well these systems work in practice.
One the first point, there’s a long way to go to get from the current narrow multimodal models for specific tasks to the type of general multimodal aggregation you seemed to suggest.
On the second point, thank you—I think you are correct that it’s a mistake/poorly written, and I’m checking with the coauthor who wrote that section.
I’m a bit confused, there are already useful results from multimodal fusions? E.g., relevant to this article, there are papers demonstrating that genomic + H&E information leads to better predictions on cancer survival outcome tasks than H&E or genomic inputs alone.
The paper you’ve attached implies that (some) experts already agree that multimodality is already pretty present in existing models:
Also, this is a pedantic point, but I think there is a mistake in the PDF:
As least as far as I can tell, Adaptyv Bio is not a tool/method, rather a contract research organization (CRO) that can do expression/binding affinity/thermostability assays :) it’s a very good CRO and one that I have used before, but they don’t seem related to dynamic modeling
Frankly, you can find a lot of claims in the literature (and I believe some of them). But how many of these multimodal systems are currently used in the clinic? That’s the only metric that matters. I’m not even disagreeing with the premise that multimidal systems should be able to improve prognostic power in theory. But I am curious how well these systems work in practice.
One the first point, there’s a long way to go to get from the current narrow multimodal models for specific tasks to the type of general multimodal aggregation you seemed to suggest.
On the second point, thank you—I think you are correct that it’s a mistake/poorly written, and I’m checking with the coauthor who wrote that section.