HomeAI Tools
text-to-video-synthesis

text-to-video-synthesis

达摩院
330 liked
entry-slick
entry-slick
About text-to-video-synthesis

This model is based on a multi-stage text-to-video generation diffusion model, which inputs a description text and returns a video that matches the text description. Only English input is supported.

Model Description

The text-to-video generation diffusion model consists of three sub-networks: text feature extraction, text feature-to-video latent space diffusion model, and video latent space to video visual space. The overall model parameters are about 1.7 billion. Support English input. The diffusion model adopts the Unet3D structure, and realizes the function of video generation through the iterative denoising process from the pure Gaussian noise video.

How to expect the model to be used and where it is applicable

This model has a wide range of applications and can reason and generate videos based on arbitrary English text descriptions. Some generated video examples are as follows, the upper part is the input text, and the lower part is the corresponding generated video:

Robot dancing in times square. Robot dancing in times square.Clown fish swimming through the coral reef.Clown fish swimming through the coral reef.Melting ice cream dripping down the cone. Melting ice cream dripping down the cone.A waterfall flowing through glacier at night. A waterfall flowing through glacier at night.A cat eating food out of a owl, in style of van Gogh.A cat eating food out of a owl, in style of van Gogh.Tiny plant sprout coming out of the ground. Tiny plant sprout coming out of the ground.Hyper-realistic photo of an abandoned industrial site during a storm.Hyper-realistic photo of an abandoned industrial site during a storm.Balloon full of water exploding in extreme slow motion.Balloon full of water exploding in extreme slow motion.Incredibly detailed science fiction scene set on an alien planet, view of a marketplace. Pixel art.Incredibly detailed science fiction scene set on an alien planet, view of a marketplace. Pixel art.

Visit Official Website

https://modelscope.cn/models/damo/text-to-video-synthesis/summary

Community Posts
no data
Nothing to display