Gemini 2.0 AI Model Released by Google

Google announced Gemini 2.0 Flash Experimental today on December 11, 2024. This experimental release enables developers to build more immersive and interactive applications with Gemini. Developers can access Gemini 2.0 Flash via the Gemini API in Google AI Studio and Vertex AI. General availability is expected early next year.

Here are some of the key features of Gemini 2.0 Flash:

Better Performance: Gemini 2.0 Flash is more powerful than Gemini 1.5 Pro but still maintains the expected speed and efficiency. It has improved performance in multimodal understanding, text, code, video, and spatial reasoning. Improved spatial understanding allows for more accurate bounding boxes on small objects in cluttered images and better object identification and captioning.
New Output Modalities: Gemini 2.0 Flash can generate responses with text, audio, and images through a single API call. These outputs will include SynthID invisible watermarks to combat misinformation.
- Multilingual native audio output: This gives developers control over what the model says and how, with eight high-quality voices and a range of languages and accents.
- Native image output: This allows developers to generate images and edit them conversationally in multiple turns. It can also output interleaved text and images for use in multimodal content.
Native Tool Use: Gemini 2.0 can call tools like Google Search and execute code. It can also use custom third-party functions through function calling. Using Google Search natively can lead to more factual and comprehensive answers.
Multimodal Live API: Developers can use this to build real-time applications with audio and video streaming inputs from cameras or screens. The API supports natural conversational patterns like interruptions and voice activity detection. It can also integrate multiple tools to handle complex use cases.

Google highlighted some startups already using Gemini 2.0 Flash to prototype new experiences:

tldraw: A visual playground
Viggle: Virtual character creation and audio narration
Toonsutra: Contextual multilingual translation
Rooms: Real-time audio features

To help developers start building, Google released three starter app experiences in Google AI Studio, as well as open source code for spatial understanding, video analysis, and Google Maps exploration.

Gemini 2.0 and AI Code Assistance

Google also announced the development of coding agents that use Gemini 2.0. These agents can execute tasks on behalf of the developer.

Jules, the AI-Powered Code Agent: This agent can handle Python and Javascript coding tasks. Jules works asynchronously and integrates with GitHub workflows. It can handle bug fixes and other time-consuming coding tasks, creating multi-step plans, modifying files, and preparing pull requests. Jules is currently available to a select group of trusted testers and will be made available to other interested developers in early 2025.
Colab's Data Science Agent: Colab is integrating agentic capabilities powered by Gemini 2.0, allowing developers to describe analysis goals in natural language and watch a notebook take shape automatically. Developers can join the trusted tester program for early access. This feature is expected to roll out more widely in the first half of 2025.

Google plans to bring Gemini 2.0 to platforms like Android Studio, Chrome DevTools, and Firebase in the coming months. Developers can also sign up to use Gemini 2.0 Flash in Gemini Code Assist for enhanced coding assistance in popular IDEs like Visual Studio Code, IntelliJ, and PyCharm.