A Note from Sundar Pichai, CEO of Google and Alphabet:
Information lies at the core of human progress. For over 26 years, we have focused on our mission to organize the world’s information and make it universally accessible and useful. This drive has pushed us to continuously expand the boundaries of AI, organizing input and presenting output in ways that are genuinely helpful.
When we introduced Gemini 1.0 last December, this vision took a major step forward. Designed from the ground up to be multimodal, Gemini 1.0 and 1.5 made great strides in understanding and processing information across text, video, images, audio, and code, enhanced by advanced multimodality and long-context capabilities.
Today, millions of developers are building with Gemini. It’s also helping us reimagine our products—all seven of which have over 2 billion users each—and create new ones. NotebookLM, for instance, showcases how multimodality and long-context capabilities enable transformative user experiences, making it a beloved tool for many.
Over the past year, we’ve invested in developing more agentic models, meaning they understand the world around you better, can think several steps ahead, and act on your behalf under your direction. Today, we’re thrilled to launch the next generation of models built for this agentic era: Gemini 2.0, our most capable model yet. With new advancements in multimodality—such as native image and audio output—and built-in tool use, Gemini 2.0 enables the creation of new AI agents, bringing us closer to our vision of a universal assistant.
Starting today, Gemini 2.0 is available to developers and trusted testers. We’re also working quickly to bring it to our products, with Gemini and Search leading the way. Furthermore, the experimental Gemini 2.0 Flash model is now available to all Gemini users. We’re also introducing Deep Research, a new feature that acts as a research assistant, leveraging advanced reasoning and long-context capabilities to explore complex topics and compile reports on your behalf. This feature is now available in Gemini Advanced.
Building Responsibly in the Agentic Era
As we advance these technologies, we’re committed to responsibility, focusing on safety and ethical considerations. Our efforts include:
• Working with our Responsibility and Safety Committee (RSC) to identify and mitigate risks.
• Leveraging Gemini 2.0’s reasoning capabilities for AI-assisted red-teaming and safety optimization.
• Conducting rigorous evaluations of multimodal inputs and outputs for safety enhancements.
• Implementing user privacy controls, such as session deletion and memory management.
• Proactively addressing potential misuse, such as phishing and malicious instructions.
By approaching development thoughtfully and incrementally, we aim to ensure that Gemini 2.0 sets a high standard for safety, utility, and innovation in the agentic AI era.
By Demis Hassabis, CEO of Google DeepMind, and Koray Kavukcuoglu, CTO of Google DeepMind
Over the past year, we’ve continued to make incredible strides in artificial intelligence. Today, we’re releasing the first model in the Gemini 2.0 family: an experimental version of Gemini 2.0 Flash. This state-of-the-art model combines low latency with enhanced performance, making it our workhorse model.
We’re also showcasing the boundaries of our agentic research through prototypes enabled by Gemini 2.0’s core multimodal capabilities.
Gemini 2.0 Flash
Gemini 2.0 Flash builds upon the success of our most popular model, Gemini 1.5 Flash, offering improved performance with the same fast response times. Remarkably, it outperforms Gemini 1.5 Pro on key benchmarks at twice the speed. Additionally, Gemini 2.0 Flash introduces new capabilities. Beyond supporting multimodal inputs like images, video, and audio, it now supports multimodal outputs, including:
• Native image generation mixed with text and steerable text-to-speech (TTS) in multilingual audio.
• Seamless tool integration, such as Google Search, code execution, and third-party user-defined functions.
Developer Access
Our goal is to deliver models to people quickly and safely. Over the past month, we’ve shared early experimental versions of Gemini 2.0 with developers, who have provided valuable feedback.
Gemini 2.0 Flash is now available to developers through the Gemini API in Google AI Studio and Vertex AI, offering multimodal input and text output to all developers. Early-access partners can also use text-to-speech and native image generation. More model sizes will be available in January, along with general availability.
To assist developers in creating dynamic and interactive applications, we’re also releasing a new Multimodal Live API. It supports real-time audio and video streaming inputs and enables the use of multiple combined tools. Details about Gemini 2.0 Flash and the Multimodal Live API can be found on our developer blog.
Gemini 2.0 in Our AI Assistant and Beyond
From today, global Gemini users can access the chat-optimized version of 2.0 Flash by selecting it from the model dropdown on desktop and mobile web. It will soon be available in the Gemini mobile app. This new model enhances the Gemini assistant’s capabilities, making it even more helpful.
Early next year, we’ll expand Gemini 2.0 to more Google products.
Unlocking Agentic Experiences with Gemini 2.0
Gemini 2.0 Flash’s core features, including UI action capabilities, multimodal reasoning, long-context understanding, complex instruction-following, compositional function-calling, native tool use, and latency improvements, collectively enable a new category of agentic experiences.
AI agents offer exciting possibilities for practical applications. We’re exploring this new frontier through prototypes designed to help people accomplish tasks and get work done. Examples include:
1. Project Astra: Updates to our research prototype for a universal AI assistant.
2. Project Mariner: Exploring the future of human-agent interaction, starting with your browser.
3. Jules: An AI-powered coding assistant integrated into GitHub workflows.
Agentic Capabilities in Virtual and Physical Worlds
Gemini 2.0’s agentic capabilities extend beyond practical tasks to virtual environments and even robotics. For example:
• Gaming: Gemini 2.0-powered agents can navigate virtual game worlds and provide real-time suggestions based on on-screen actions.
• Robotics: By applying spatial reasoning, these agents show promise in assisting within physical environments.
Building Responsibly in the Agentic Era
As we advance these technologies, we’re committed to responsibility, focusing on safety and ethical considerations. Our efforts include:
• Working with our Responsibility and Safety Committee (RSC) to identify and mitigate risks.
• Leveraging Gemini 2.0’s reasoning capabilities for AI-assisted red-teaming and safety optimization.
• Conducting rigorous evaluations of multimodal inputs and outputs for safety enhancements.
• Implementing user privacy controls, such as session deletion and memory management.
• Proactively addressing potential misuse, such as phishing and malicious instructions.
By approaching development thoughtfully and incrementally, we aim to ensure that Gemini 2.0 sets a high standard for safety, utility, and innovation in the agentic AI era.