Crossmark

LaVie: High-Quality Video Generation with Cascaded Latent Diffusion Models

Published Online: 2024-12-23

Published Print: 2025-05

Authors

Wang, Yaohui https://orcid.org/0009-0002-9487-6187
Chen, Xinyuan

Ma, Xin

Zhou, Shangchen

Huang, Ziqi

Wang, Yi

Yang, Ceyuan

He, Yinan

Yu, Jiashuo

Yang, Peiqing

Guo, Yuwei

Wu, Tianxing

Si, Chenyang

Jiang, Yuming

Chen, Cunjian

Loy, Chen Change

Dai, Bo

Lin, Dahua

Qiao, Yu

Liu, Ziwei
Funding

Funding for this research was provided by:

National Key R&D Program China (2022ZD0160102)

National Natural Science Foundation of China (62102150)

Science and Technology Commission of Shanghai Municipality (23QD1400800)

Science and Technology Commission of Shanghai Municipality (23YF1461900)
License Information

Text and Data Mining valid from 2024-12-23

Version of Record valid from 2024-12-23
More Information

Article History

Received: 29 March 2024

Accepted: 28 October 2024

First Online: 23 December 2024

Declarations

:

: We acknowledge the ethical concerns that are shared with other T2I and T2V diffusion models. We aim to synthesize high-quality videos by giving text descriptions. Our approach can be used for movie production, making video games, artistic creation, generating synthetic data for other computer vision tasks, etc. We note that our framework has the potential to introduce unintended bias as a result of the training data.

Document is current