Does “multi-modality” include features like having a physical world model, such that it could input sensible commands to robot body, for instance?
Does “multi-modality” include features like having a physical world model, such that it could input sensible commands to robot body, for instance?