Nothing's in my cart
2-minute read
As OpenAI announced to the world its AI model Sora, capable of generating hyper-realistic videos from a single sentence (and sparked debate on whether or not Sora is a "world simulator"), Google DeepMind simultaneously unveiled its latest AI milestone, "Genie: Generative Interactive Environments." Genie is touted as a "foundation world model" that, trained on internet videos without action labels, can generate a multitude of playable, action-controllable virtual worlds from synthesized images, real photos, or even sketches.
(Source: Google Deepmind)
While OpenAI describes Sora as having the emerging capabilities of "world simulators," Google labels their own Genie as an "AI-generated foundation world model," hinting at a competitive edge between the two. But what exactly gives Google the confidence to make such a claim?
According to official statements, Genie learns from publicly available internet videos. Its remarkable ability lies in autonomously identifying "controllable" actions and recognizing commonalities across different videos, which are the various potential actions.
(Source: Google Deepmind)
Even though Genie currently focuses on learning from 2D game footage and robot-related videos, Google is confident that Genie will be able to generate a wide range of interactive environments in the future, including those for AI games and the metaverse. This is why Google dares to call Genie a "foundation world model."
Let's take a look at some of the applications of Genie!
Initially, I thought Genie could "generate a game from a single sentence," but it seems that it's not quite there yet. Google's paper describes a process where they first use their own text-to-image model, Imagen2, and then Genie transforms the image into generative games video. However, I believe that if Google can integrate Imagen2 with Genie into a multimodal, then converting text to game videos should be feasible.
First generate an image with Imagen2, then create ai games footage with Genie. (Source: Google Deepmind)
Here's another cool feature.
Suppose you're a game artist who has drawn a game concept sketch. Genie can bring your concept to life by transforming the sketch into an AI animated in-game scene!
Start with a draft sketch. Amazing! Genie brings the scene to life.
(Source: Google Deepmind)
And here's another fun application. If you don't feel like drawing, just take a photo, and Genie can infuse it with a sense of gameplay. If it's realistic enough, it might even be suitable for use as reference for frame-by-frame animation in the metaverse.
Reminiscent of childhood memories where we'd make our toys fight each other. (Source: Google Deepmind)
Compared to OpenAI's Sora, Google's Genie may not seem as impressive, but Google still believes that Genie has significant implications for the journey towards General Artificial Intelligence (AGI). Humanity has always seen virtual game worlds as breeding grounds for AGI. Google starts by teaching AI to understand the digital world, then to create virtual world videos, and next, to use AI to build virtual worlds and craft the metaverse. Perhaps, one day in the future, we will not only enter the metaverse, but AI might also break through the virtual barrier into our world. Could our coexistence with AI in the future be initiated by Genie?