Google DeepMind’s Genie

For Latest Updates, Current Affairs & Knowledgeable Content.

Context:

Recently Google DeepMind has introduced Genie which is a new model that can generate interactive video games from just a text or image prompt.
Genie can generate games without any prior training on game mechanics (which are essentially rules, elements, and processes that make up a game).

What is Genie?

According to the official Google DeepMind blog post, Genie is a foundation world model which is trained on videos sourced from the Internet.
The model can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and also even sketches.
The research paper ‘Genie: Generative Interactive Environments’ states that Genie is the first generative interactive environment which has been trained in an unsupervised manner from unlabelled internet videos.
When it comes to size, Genie stands at nearly 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
These technical specifications let the Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.

What does Genie do?

According to the research paper published, Genie is a new kind of generative AI that enables anyone even children to dream up and step into generated worlds similar to human-designed simulated environments.
Genie can be prompted to generate a various set of interactive and controllable environments although it is trained on video-only data.
In simple words, we have seen numerous generative AI models which can produce creative content with language, images and even videos.
It is a breakthrough as it makes playable environments from a single image prompt.
According to Google DeepMind, Genie can be prompted with images that it has never seen.
These images include real world photographs, sketches, allowing people to interact with their imagined virtual worlds.
This is also known as a foundation world model.
When it comes to training, the research paper highlights that they focus more on videos of 2D platformer games and also robotics.
Genie is trained on a general method which allows it to function on any type of domain, and it is scalable to even larger Internet datasets.

Why is it important?

The standout aspect of Genie is that its ability to learn and reproduce controls for in-game characters exclusively from internet videos.
This is important because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.
Genie also learns not only which parts of an observation are generally controllable, but also infers diverse latent actions that are consistent across the generated environments.
According to Google DeepMind, the most distinct aspect of this Genie model is that it allows you to create an entire new interactive environment from a single image.
This opens up many possibilities, especially new ways to create and also to step into virtual worlds.
To demonstrate this, the researchers have created an in image using text-to-image model Imagen 2 and then used it as a prompt to create virtual worlds.
The same can also be done with sketches.
With Genie, anyone will now be able to create their own entirely imagined virtual worlds.
Apart from that, the model’s ability to learn and develop new world models signals a significant leap towards general AI agents (an independent programme or entity that interacts with its environments by perceiving its surroundings via sensors).

Any Doubts ? Connect With Us.

For Latest Updates & Daily Current Affairs