Google has introduced a new model for building robots, known as Gemini Robotics which is based on Gemini 2.0.
In a blog post on 12 March, Google DeepMind said the new model will lay the foundation for a new generation of helpful robots.
Gemini Robotics is an advanced vision-language-action (VLA) model that was built on physical actions as a new output modality for the purpose of directly controlling robots.
Bringing AI into physical world
Google has been deeply involved with AI but up to this point, it has been in the form of software for use online and on computers.
With the introduction of Gemini Robotics, that is changing because this model is designed specifically to power robots that assist humans with everyday tasks.
Because of this particular use case, the model is designed to be general, that is they can be adapted for different situations.
They are also interactive, enabling them to understand and respond quickly to instructions or changes in their environment. Finally, they are dexterous, that is they can use their fingers and hands to do things, which is typical of robots.
According to DeepMind, Gemini Robotics more than doubles performance on a comprehensive generalization benchmark compared to other state-of-the-art vision-language-action models.
The model was trained primarily on data from the bi-arm robotic platform, ALOHA 2, but it can control a bi-arm platform as well, based on the Franka arms used in many academic labs.
It can also be adapted further for more complex embodiments, such as the humanoid Apollo robot developed by Apptronik, with the goal of completing real world tasks.
Gemini’s long journey
Gemini has been used for many purposes before now. These include creating original images on Google Doc, and empowering employees in different organizations to carry out tasks.
Clearly, Gemini Robotics is taking Google’s adventure into AI to a whole new level, and hopefully it will function better than it has done on the software level.