Google DeepMind’s Genie
Context:
- Recently Google DeepMind has introduced Genie which is a new model that can generate interactive video games from just a text or image prompt.
- Genie can generate games without any prior training on game mechanics (which are essentially rules, elements, and processes that make up a game).
What is Genie?
- According to the official Google DeepMind blog post, Genie is a foundation world model which is trained on videos sourced from the Internet.
- The model can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and also even sketches.
- The research paper ‘Genie: Generative Interactive Environments’ states that Genie is the first generative interactive environment which has been trained in an unsupervised manner from unlabelled internet videos.
- When it comes to size, Genie stands at nearly 11B parameters and consists of a spatiotemporal video tokenizer, an autoregressive dynamics model, and a simple and scalable latent action model.
- These technical specifications let the Genie act in generated environments on a frame-by-frame basis even in the absence of training, labels, or any other domain-specific requirements.
What does Genie do?
- According to the research paper published, Genie is a new kind of generative AI that enables anyone even children to dream up and step into generated worlds similar to human-designed simulated environments.
- Genie can be prompted to generate a various set of interactive and controllable environments although it is trained on video-only data.
- In simple words, we have seen numerous generative AI models which can produce creative content with language, images and even videos.
- It is a breakthrough as it makes playable environments from a single image prompt.
- According to Google DeepMind, Genie can be prompted with images that it has never seen.
- These images include real world photographs, sketches, allowing people to interact with their imagined virtual worlds.
- This is also known as a foundation world model.
- When it comes to training, the research paper highlights that they focus more on videos of 2D platformer games and also robotics.
- Genie is trained on a general method which allows it to function on any type of domain, and it is scalable to even larger Internet datasets.
Why is it important?
- The standout aspect of Genie is that its ability to learn and reproduce controls for in-game characters exclusively from internet videos.
- This is important because internet videos do not have labels about the action that is performed in the video, or even which part of the image should be controlled.
- Genie also learns not only which parts of an observation are generally controllable, but also infers diverse latent actions that are consistent across the generated environments.
- According to Google DeepMind, the most distinct aspect of this Genie model is that it allows you to create an entire new interactive environment from a single image.
- This opens up many possibilities, especially new ways to create and also to step into virtual worlds.
- To demonstrate this, the researchers have created an in image using text-to-image model Imagen 2 and then used it as a prompt to create virtual worlds.
- The same can also be done with sketches.
- With Genie, anyone will now be able to create their own entirely imagined virtual worlds.
- Apart from that, the model’s ability to learn and develop new world models signals a significant leap towards general AI agents (an independent programme or entity that interacts with its environments by perceiving its surroundings via sensors).