Deepmind’s Gato: Generalist Agent

From the abstract, emphasis mine:

The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stackblocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

(Will edit to add more as I read. ETA: 1a3orn posted first.)

  1. It’s only 1.2 billion parameters. (!!!) They say this was to avoid latency in the robot control task.

  2. It was trained offline, purely supervised, but could in principle be trained online, with RL, etc

  3. Performance results:

The section on broader implications is interesting. Selected quote:

In addition, generalist agents can take actions in the the physical world; posing new challenges that may require novel mitigation strategies. For example, physical embodiment could lead to users anthropomorphizing the agent, leading to misplaced trust in the case of a malfunctioning system, or be exploitable by bad actors. Additionally, while cross-domain knowledge transfer is often a goal in ML research, it could create unexpected and undesired outcomes if certain behaviors (e.g. arcade game fighting) are transferred to the wrong context. The ethics and safety considerations of knowledge transfer may require substantial new research as generalist systems advance. Technical AGI safety (Bostrom, 2017) may also become more challenging when considering generalist agents that operate in many embodiments. For this reason, preference learning, uncertainty modeling and value alignment (Russell, 2019) are especially important for the design of human-compatible generalist agents. It may be possible to extend some of the value alignment approaches for language (Kenton et al., 2021; Ouyang et al., 2022) to generalist agents. However, even as technical solutions are developed for value alignment, generalist systems could still have negative societal impacts even with the intervention of well-intentioned designers, due to unforeseen circumstances or limited oversight (Amodei et al., 2016). This limitation underscores the need for a careful design and a deployment process that incorporates multiple disciplines and viewpoints.

They also do some scaling analysis and yup, you can make it smarter by making it bigger.

What do I think about all this?

Eh, I guess it was already priced in. I think me + most people in the AI safety community would have predicted this. I’m a bit surprised that it works as well as it does for only 1.2B parameters though.