Please Select Your Location
Australia
Österreich
België
Canada
Canada - Français
中国
Česká republika
Denmark
Deutschland
France
HongKong
Iceland
Ireland
Italia
日本
Korea
Latvija
Lietuva
Lëtzebuerg
Malta
المملكة العربية السعودية (Arabic)
Nederland
New Zealand
Norge
Polska
Portugal
Russia
Saudi Arabia
Southeast Asia
España
Suisse
Suomi
Sverige
台灣
Ukraine
United Kingdom
United States
United States
Please Select Your Location
België
Česká republika
Denmark
Iceland
Ireland
Italia
Latvija
Lietuva
Lëtzebuerg
Malta
Nederland
Norge
Polska
Portugal
España
Suisse
Suomi
Sverige
<< Back to Blog

DeepMind Unveils Genie 3, Google’s World Model Toward AGI

VIVE POST-WAVE Team • Sept. 16, 2025

7 minutes read

Can generative AI create worlds, just like it generates text, images, and videos? Ever since Google launched the Genie series and Microsoft introduced WHAMM, the possibilities of AI in world modeling have expanded. Many people immediately think, "Could this be used to create games? Is this the prototype for the next generation of virtual world platforms?" Such thoughts aren't surprising, as these models resemble interactive 3D space generators.

However, the real focus of these models might not be on creating entertaining spaces, but rather on providing a training ground for AI agents to repeatedly experiment, reason, and learn. The problem with past models was a lack of consistency. Now, the upgraded Genie 3 not only enhances visuals and real-time operations but also achieves long horizon consistency for the first time, making training truly effective and laying a more concrete foundation for the development of Artificial General Intelligence (AGI).

Genie 3 Image

Genie 3 can generate highly realistic 3D virtual spaces. (Source: Google DeepMind)


What is Genie 3? The Upgraded World Model

According to Google DeepMind, Genie 3 is a general-purpose world model that can generate interactive environments based on text prompts in real-time at 720p and 24 frames per second. Compared to its predecessor, Genie 2, it has three major advancements:

1. Real-time Interaction

Older models, like Genie 2 or WHAMM, either had delayed interactions or could only maintain scenes briefly. Genie 3 updates the screen and responds instantly after user input, making the exploration feel as real-time as a game engine.

2. Long-horizon Consistency

During continuous interaction, Genie 3 can remember and maintain the state of previously generated scenes—even if a player returns to the same spot a minute later, the objects, lighting, and even weather remain consistent, reducing the jarring feeling of physical disarray. For example, if you walk past a tree and circle back, it will still be there, unchanged.

Genie 3 achieves spatial consistency through "auto-regressive generation," meaning the model references the previous frame it generated to create the next one, continuously building on its output.

Long-horizon Consistency Image

The challenge is that the "memory range" of this approach extends over time. For Genie 3 to remember details from a minute ago is quite a feat. It must balance short-term memory (where you walked a minute ago, where things are) and response speed (immediate reflection of new actions) to achieve a smooth and consistent interactive experience.

3. Promptable World Events

Beyond basic operations like observing and moving, Genie 3 introduces "promptable events." Users can change world conditions in real-time through text commands, almost like playing god. For instance, you can switch the weather, add new characters or objects, or even trigger story-like changes, expanding interaction from simple space exploration to dynamic story and scenario generation. In the demo, a brown bear suddenly appears in a meadow scene, or a flying dragon descends into the Thames in London.


More Than Just a Game: A Testing Ground for AI Agents

While Genie 3 visually resembles an "on-the-fly interactive game," its core value lies in providing a simulated world where AI agents can repeatedly experiment and reason.

These AI agents aren't just chatbots; they're virtual "actors" with perception, decision-making, and action capabilities. Google DeepMind's previous release, SIMA (Scalable Instructable Multiworld Agent), is a prime example. SIMA is designed to receive instructions in various 3D virtual environments, observe, plan, and execute step by step. In the demo, it can be instructed to buy specific items at a market, find an exhibit in a museum, or complete complex tasks requiring multiple steps.

AI Agent Image

(Source: Google DeepMind)

In the past, agents like SIMA were often limited by environmental consistency and predictability: if a scene changed illogically in a short time, the agent's decision chain would be disrupted, preventing it from truly "learning" to handle long-term situations. Genie 3's long-horizon consistency solves this issue. Now, AI agents can perform dozens of actions in a continuously existing world and remember their processes and outcomes.

More crucially, Genie 3's Promptable World Events allow researchers to introduce new variables in real-time, such as sudden weather changes, the addition of unfamiliar characters, or even completely altering mission conditions, forcing agents to reassess strategies in uncertain scenarios. These "counterfactual scenarios" are essential for achieving AGI, as they require AI to not just follow a set script but to adapt flexibly to any possible event.

In the future, whether it's to develop control systems for self-driving cars or collaborative robots, or digital assistants capable of completing tasks autonomously, these "world models" will be their starting point and testing ground.


Genie 3 Might Not Be Enough Yet

While Genie 3 achieves long horizon consistency, there are still many limitations:

  • Limited Actions: Current agents can move, observe, and interact with the environment, but the range of actions they can directly perform is still limited. Many world events need to be triggered by commands rather than completed by the agents themselves.
  • Nascent Multi-agent Interaction: Having multiple AIs act and influence each other in the same world is still a challenge. Making them coexist and interact like real crowds remains a high-difficulty task for world models.
  • Limited Duration: The enhanced consistency and coherence is a significant advancement, but it still can't support continuous tasks lasting hours or even days. For AI requiring long-term strategic planning, this is a ceiling.
  • Limited Real-world Fidelity: Even if it can generate museums or markets, these spaces don't fully correspond to real-world geography and details, leading to discrepancies in tasks requiring precise simulation.

These limitations mean that Genie 3 is still just a "closed testing ground," not yet capable of allowing AI to reside long-term and gradually accumulate experience. DeepMind has chosen to open it as a "limited research preview," allowing a small number of academic institutions and creators to test it, collecting feedback and observing, gradually stress-testing how much complexity and change this world can handle. After all, to achieve AGI, the worlds generated by models need longer memory, more organic actions, and the ability to interact with multiple intelligences simultaneously.


From Genie, Genie 2, WHAMM, to the current Genie 3, world models have evolved from generating videos and 3D scenes to maintaining consistent 3D spaces. Perhaps we're not too far from an AI virtual town with organic interactions.