Google DeepMind has unveiled Gemini Robotics On-Device, a significant advancement in bringing sophisticated AI directly to local robotic devices. This efficient vision-language-action (VLA) model is optimized to run locally on robots, ensuring minimal latency and robust performance even in environments with limited or no network connectivity. It exhibits remarkable general-purpose dexterity and fast task adaptation, enabling robots to comprehend and execute complex instructions, such as unzipping bags or folding clothes, through natural language. The On-Device model demonstrates superior visual, semantic, and behavioral generalization compared to previous alternatives, particularly in challenging scenarios. Crucially, it is the first VLA model available for fine-tuning, allowing developers to quickly adapt it to new tasks with as few as 50 to 100 demonstrations. Google DeepMind is providing a Gemini Robotics SDK to trusted testers, facilitating evaluation, simulation testing, and adaptation across various robot embodiments, including ALOHA, Franka FR3, and the Apollo humanoid. Developed with a strong emphasis on Google’s AI Principles and a comprehensive safety approach, this innovation aims to accelerate the field of robotics by making powerful AI more accessible and adaptable for real-world applications.
https://deepmind.google/discover/blog/gemini-robotics-on-device-brings-ai-to-local-robotic-devices/>