Google DeepMind announces 'Gemini Omni,' a multimodal generative model that enables video generation and editing through natural language dialogue and reasoning capabilities.

Google DeepMind, Google's AI research and development division, has announced ' Gemini Omni ,' a new multimodal generative model that can create a variety of things from any input, including video, as part of the new Gemini model family. Gemini Omni Flash will be rolled out sequentially as the first release in the Gemini app, Google Flow, and YouTube Shorts.
Gemini Omni — Google DeepMind
https://deepmind.google/models/gemini-omni/
Gemini Omni announced
https://blog.google/intl/ja-jp/company-news/technology/gemini-omni/
Introducing Gemini Omni: Create Anything from Anything - YouTube
Gemini Omni is a model that allows for more intuitive video editing using only natural language. Because every prompt inherits the previous context, the appearance and characteristics of the characters are kept consistent, the laws of physics remain intact, and the overall flow of the scene is firmly remembered.
For example, the following video was generated by Gemini Omni in response to the prompt, 'Create a bubble art piece.'
The following video was generated with the prompt, 'When a person touches a mirror, the mirror surface spreads beautiful ripples like a liquid, and the person's arm transforms into a mirror material that reflects light.'
If you enter the following text: 'A dimly lit room. A glass sphere floats on and follows the hand. Inside the sphere is a room of black and white checkerboard, and within that is a space in which the hand holding the sphere repeats infinitely. The camera slowly zooms in on the sphere in a loop,' you will get a video like this.
Gemini Omni not only creates realistic-looking scenes, but also logically infers 'what will happen next.' Google states that by combining an intuitive understanding of the laws of physics with Gemini's historical, scientific, and cultural background knowledge, it becomes possible to create meaningful stories that go beyond mere photographic beauty.
With just a short prompt, Gemini Omni can generate visuals that break down complex and difficult ideas into easily understandable terms. For example, the following video was generated with the prompt: 'A claymation explaining protein folding. Everything must be made of clay, and no human hands should be shown during production. Stop-motion animation, accurate depiction.'
Furthermore, by using the input reference function, users can utilize images of their favorite characters, background scenes, or even hand-drawn sketches to create artwork that perfectly matches their envisioned vision.
Furthermore, Google announced that it will offer an avatar feature that allows users to create videos using their own voices. This will enable users to create a digital version of themselves and generate videos that look and sound just like them. It has been reported that a similar feature is already being piloted for a small number of YouTube Shorts users.
YouTube releases 'Live Selfie' feature for short videos, enabling the creation of videos using AI avatars that record the user's face and voice in real time - GIGAZINE
Furthermore, all videos created with Gemini Omni will have the ' SynthID ' watermarking technology embedded within them. You can easily check whether a video was generated by Gemini Omni through the Gemini app, Gemini in Chrome, or Google search.
Gemini Omni Flash will be rolled out gradually to all Google AI Plus, Pro, and Ultra users worldwide starting May 20, 2026, through the Gemini app and Google Flow. It will also be rolled out free of charge to YouTube Shorts and YouTube Create app users. Furthermore, it is planned to be made available to developers and enterprises via API by June 2026.
Related Posts:







