Google launches LLM to generate videos from text, audio input
Google is releasing an LLM, called VideoPoet that can turn text to videos. To showcase VideoPoet's capabilities, Google Research has produced a short movie composed of several short clips generated by the model. Google explains that for the script, it asked Bard to write a series of prompts to detail a short story about a travelling raccoon. It then generated video clips for each prompt.
Google's scientists have unveiled the VideoPoet, a powerful Large Language Model (LLM) designed to process multimodal inputs, including text, images, video, and audio, to create videos. VideoPoet utilizes a 'decoder-only architecture,' allowing it to generate content for tasks it hasn't been explicitly trained on. The training process for VideoPoet involves two steps, mirroring LLMs – pre training and task-specific adaptation. The pre-trained LLM serves as the foundational framework that can be adapted for different video generation tasks, as explained by the researchers.
“VideoPoet is a simple modelling method that can convert any autoregressive language model or large language model (LLM) into a high-quality video generator,” reads a post on the website.
In contrast to existing video models that employ diffusion models, introducing noise to training data and then reconstructing it, VideoPoet consolidates various video generation capabilities into a unified language model. Unlike other models that have distinct components trained separately for different tasks, VideoPoet integrates all functionalities into a single Large Language Model (LLM).
See What’s Next in Tech With the Fast Forward Newsletter
Tweets From @varindiamag
Nothing to see here - yet
When they Tweet, their Tweets will show up here.