In our inherently multimodal world, information is encoded in various modalities, such as text, images, and video. While deep learning has made significant strides in unimodal or bimodal tasks, the increasing adoption of large-scale Artificial Intelligence (AI) models necessitates the development of general and flexible tools for their integration.
In a new paper GATS: Gather-Attend-Scatter, a Google DeepMind research team introduces Gather-Attend-Scatter (GATS), a pioneering module designed to seamlessly combine pretrained foundation models—whether trainable or frozen—into larger multimodal networks.
The GATS module comprises multiple GATS layers, akin to vanilla transformer layers, each equipped with local attention. These layers interleave with given networks, serving as a vital bridge between them.
GATS operates by gathering activations from all component models, attending to the most relevant information, and scattering the combined representation back to all models by modifying their original activations. Its versatility lies in being applicable to any deep neural network, as it processes activations layer by layer. GATS is agnostic to the specific details of the neural networks it combines, making it a powerful and flexible tool.
The resulting GATS multimodal architectures only necessitate training the GATS module, eliminating the need to fine-tune the original pretrained models and preventing potential loss of knowledge. This renders GATS a highly versatile and general-purpose tool for building multimodal models from diverse pretrained sources.
The research team showcased the efficacy of GATS through agent experiments in diverse environments, including Atari Pong, Language-Table, and YCB. GATS demonstrated its ability to seamlessly integrate text and image models, showcasing its versatility in processing and generating various modalities. The results underscored the capability of GATS-based models to effectively leverage pretrained models and achieve state-of-the-art performance.
GATS-based architectures exhibit modularity and extendibility, paving the way for exciting possibilities in future research. This framework provides flexibility to incorporate additional modalities, leverage larger and more powerful foundation models, and explore novel applications requiring the coordination and processing of multimodal inputs.
Overall, GATS represents a significant stride in the realm of multimodal AI integration. Its ability to seamlessly combine pretrained models, eliminate the need for extensive finetuning, and exhibit modularity positions it as a versatile tool for researchers and practitioners. The demonstrated effectiveness across diverse experiments opens the door to innovative applications and inspires further exploration in the field of multimodal AI.
The paper GATS: Gather-Attend-ScatterarXiv.
Author: Hecate He | Editor: Chain Zhang
We know you don’t want to miss any news or research breakthroughs. Subscribe to our popular newsletter Synced Global AI Weekly to get weekly AI updates.

Hollywoodbet stands at the intersection of entertainment and betting, offering users a holistic online experience. With a commitment https://hollywoodbet.app/data-free/ to diversity, user satisfaction, and technological advancements, Hollywoodbet continues to shape the landscape of online entertainment and sports betting. As users embark on their Hollywoodbet journey, they find themselves in a digital arena where entertainment and the thrill of betting converge, creating an immersive and dynamic online haven.
Osh University, the premier mbbs university , offers a transformative learning experience. Nestled in an intellectually stimulating environment, students at Osh receive unparalleled medical education, shaping them into the future leaders of healthcare.
Shalamar Hospital’s urologist in Lahore is committed to delivering high-quality urological care, combining expertise with compassionate service in a modern medical environment.
With the selection of kids undergarments from Tempo Garments, discover quality and comfort. Our range, which is made with delicate fabrics and meticulous attention to detail, guarantees your childrens maximum comfort.
GATS incorporates graph attention mechanisms, A Difficult Game About Climbing which enable the model to selectively attend to relevant information from different modalities while filtering out noise or irrelevant signals.