Wang, Yaohui https://orcid.org/0009-0002-9487-6187
Chen, Xinyuan
Ma, Xin
Zhou, Shangchen
Huang, Ziqi
Wang, Yi
Yang, Ceyuan
He, Yinan
Yu, Jiashuo
Yang, Peiqing
Guo, Yuwei
Wu, Tianxing
Si, Chenyang
Jiang, Yuming
Chen, Cunjian
Loy, Chen Change
Dai, Bo
Lin, Dahua
Qiao, Yu
Liu, Ziwei
Funding for this research was provided by:
National Key R&D Program China (2022ZD0160102)
National Natural Science Foundation of China (62102150)
Science and Technology Commission of Shanghai Municipality (23QD1400800)
Science and Technology Commission of Shanghai Municipality (23YF1461900)
Article History
Received: 29 March 2024
Accepted: 28 October 2024
First Online: 23 December 2024
Declarations
:
: We acknowledge the ethical concerns that are shared with other T2I and T2V diffusion models. We aim to synthesize high-quality videos by giving text descriptions. Our approach can be used for movie production, making video games, artistic creation, generating synthetic data for other computer vision tasks, etc. We note that our framework has the potential to introduce unintended bias as a result of the training data.