NEWS

Google launches LLM to generate videos from text, audio input

By VARINDIA - 2024-01-01

Google is releasing an LLM, called VideoPoet that can turn text to videos. To showcase VideoPoet's capabilities, Google Research has produced a short movie composed of several short clips generated by the model. Google explains that for the script, it asked Bard to write a series of prompts to detail a short story about a travelling raccoon. It then generated video clips for each prompt.

Google's scientists have unveiled the VideoPoet, a powerful Large Language Model (LLM) designed to process multimodal inputs, including text, images, video, and audio, to create videos. VideoPoet utilizes a 'decoder-only architecture,' allowing it to generate content for tasks it hasn't been explicitly trained on. The training process for VideoPoet involves two steps, mirroring LLMs – pre training and task-specific adaptation. The pre-trained LLM serves as the foundational framework that can be adapted for different video generation tasks, as explained by the researchers.

“VideoPoet is a simple modelling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator,” reads a post on the website.

In contrast to existing video models that employ diffusion models, introducing noise to training data and then reconstructing it, VideoPoet consolidates various video generation capabilities into a unified language model. Unlike other models that have distinct components trained separately for different tasks, VideoPoet integrates all functionalities into a single Large Language Model (LLM).