\n","updatedAt":"2025-03-27T02:22:37.447Z","author":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","fullname":"AK","name":"akhaliq","type":"user","isPro":false,"isHf":true,"isHfAdmin":false,"isMod":false,"followerCount":8232}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.35581323504447937},"editors":["akhaliq"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg"],"reactions":[],"isReport":false}},{"id":"67e5fc8c046d17780e6055af","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-03-28T01:34:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [What Are You Doing? A Closer Look at Controllable Human Video Generation](https://huggingface.co/papers/2503.04666) (2025)\n* [Goku: Flow Based Video Generative Foundation Models](https://huggingface.co/papers/2502.04896) (2025)\n* [CascadeV: An Implementation of Wurstchen Architecture for Video Generation](https://huggingface.co/papers/2501.16612) (2025)\n* [Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos](https://huggingface.co/papers/2502.21314) (2025)\n* [Pre-Trained Video Generative Models as World Simulators](https://huggingface.co/papers/2502.07825) (2025)\n* [QuantCache: Adaptive Importance-Guided Quantization with Hierarchical Latent and Layer Caching for Video Generation](https://huggingface.co/papers/2503.06545) (2025)\n* [Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT](https://huggingface.co/papers/2502.06782) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-03-28T01:34:04.666Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7094343304634094},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2503.20314","authors":[{"_id":"67e4b65a080a33e3955b340c","name":"WanTeam","hidden":false},{"_id":"67e4b65a080a33e3955b340e","user":{"_id":"63f1f1727ddf724fbcbc9c7e","avatarUrl":"/avatars/9e0516d9b1036c23c78f313c79872f55.svg","isPro":false,"fullname":"Ang Wang","user":"ang-annng","type":"user"},"name":"Ang Wang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:47:28.144Z","hidden":false},{"_id":"67e4b65a080a33e3955b340f","user":{"_id":"64755ff5a51711a3b59118af","avatarUrl":"/avatars/2e899088902db94e785107c3ec2abe85.svg","isPro":false,"fullname":"Baole Ai","user":"baoleai","type":"user"},"name":"Baole Ai","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:47:49.260Z","hidden":false},{"_id":"67e4b65a080a33e3955b3410","name":"Bin Wen","hidden":false},{"_id":"67e4b65a080a33e3955b3411","user":{"_id":"6458970cab9a44f42f620a80","avatarUrl":"/avatars/f9779b0621c931f922440fec95342444.svg","isPro":false,"fullname":"chaojie mao","user":"chaojiemao","type":"user"},"name":"Chaojie Mao","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:48:02.730Z","hidden":false},{"_id":"67e4b65a080a33e3955b3412","user":{"_id":"66592c72f4124d863fd55574","avatarUrl":"/avatars/98f0d5e6ba3728e8a1164aa5188a3298.svg","isPro":false,"fullname":"Chenwei Xie","user":"chenweix7","type":"user"},"name":"Chen-Wei Xie","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:48:10.933Z","hidden":false},{"_id":"67e4b65a080a33e3955b3413","name":"Di Chen","hidden":false},{"_id":"67e4b65a080a33e3955b3414","name":"Feiwu Yu","hidden":false},{"_id":"67e4b65a080a33e3955b3415","user":{"_id":"67a73767282aa06f7bcaeeb1","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/J28OVrPhD0xYulWMgICmW.png","isPro":false,"fullname":"Haiming Zhao","user":"HermanZ","type":"user"},"name":"Haiming Zhao","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:48:26.135Z","hidden":false},{"_id":"67e4b65a080a33e3955b3416","user":{"_id":"651441e92c5da979038df5ee","avatarUrl":"/avatars/85cdafcccb522eced50dc9e4770b630a.svg","isPro":false,"fullname":"Jianxiao Yang","user":"Jianxiao0203","type":"user"},"name":"Jianxiao Yang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:48:33.714Z","hidden":false},{"_id":"67e4b65a080a33e3955b3417","user":{"_id":"6274b866f978441a764b30f6","avatarUrl":"/avatars/953b1ff82f63e371a7358a85d68304cd.svg","isPro":false,"fullname":"jianyuan.zengjy","user":"filwsyl","type":"user"},"name":"Jianyuan Zeng","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:48:40.108Z","hidden":false},{"_id":"67e4b65a080a33e3955b3418","name":"Jiayu Wang","hidden":false},{"_id":"67e4b65a080a33e3955b3419","user":{"_id":"66f0e0262aee3cb7e981bbac","avatarUrl":"/avatars/f8f1e70469b5e047dc6e0e9dec6c5bc1.svg","isPro":false,"fullname":"Jingfeng Zhang","user":"jingfengzhang","type":"user"},"name":"Jingfeng Zhang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:48:58.316Z","hidden":false},{"_id":"67e4b65a080a33e3955b341a","user":{"_id":"602f88f5e8149a962412a667","avatarUrl":"/avatars/b78f0e583df8e5d5e3365934fe5f4900.svg","isPro":false,"fullname":"Zhou","user":"Jingren","type":"user"},"name":"Jingren Zhou","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:49:09.146Z","hidden":false},{"_id":"67e4b65a080a33e3955b341b","user":{"_id":"627c93b2bec91eb1720b8bad","avatarUrl":"/avatars/89c31c71aa5027543ed5be0471fe1109.svg","isPro":false,"fullname":"Jinkai Wang","user":"zwsjink","type":"user"},"name":"Jinkai Wang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:49:15.680Z","hidden":false},{"_id":"67e4b65a080a33e3955b341c","user":{"_id":"6465941d0e6c7618f615675b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6465941d0e6c7618f615675b/W4EHqlCucz_bojFLFEeV_.jpeg","isPro":false,"fullname":"Jixuan Chen","user":"Mayome","type":"user"},"name":"Jixuan Chen","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:49:25.437Z","hidden":false},{"_id":"67e4b65a080a33e3955b341d","name":"Kai Zhu","hidden":false},{"_id":"67e4b65a080a33e3955b341e","name":"Kang Zhao","hidden":false},{"_id":"67e4b65a080a33e3955b341f","name":"Keyu Yan","hidden":false},{"_id":"67e4b65a080a33e3955b3420","name":"Lianghua Huang","hidden":false},{"_id":"67e4b65a080a33e3955b3421","user":{"_id":"63b4ec15103617b0a5b3101e","avatarUrl":"/avatars/e6faad833b31ad5d892faccf621e7a34.svg","isPro":false,"fullname":"Mengyang Feng","user":"archerfmy","type":"user"},"name":"Mengyang Feng","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:50:01.919Z","hidden":false},{"_id":"67e4b65a080a33e3955b3422","user":{"_id":"66eae63f533fd44f8a8ca60b","avatarUrl":"/avatars/38cecb4c80cc7a6e63028fcb572e3a22.svg","isPro":false,"fullname":"Zhang Ningyi","user":"ZhangNy","type":"user"},"name":"Ningyi Zhang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:50:13.628Z","hidden":false},{"_id":"67e4b65a080a33e3955b3423","name":"Pandeng Li","hidden":false},{"_id":"67e4b65a080a33e3955b3424","user":{"_id":"64c5182771947b03ffee931c","avatarUrl":"/avatars/478f4e06ac1bced092dde0f11963a975.svg","isPro":false,"fullname":"Wupingyu","user":"wpy1999","type":"user"},"name":"Pingyu Wu","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:50:38.625Z","hidden":false},{"_id":"67e4b65a080a33e3955b3425","user":{"_id":"642e3bcb958faf258a40e89c","avatarUrl":"/avatars/213501def37dc53032cee17e37fcc4c1.svg","isPro":false,"fullname":"Ruihang Chu","user":"Ruihang","type":"user"},"name":"Ruihang Chu","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:50:46.771Z","hidden":false},{"_id":"67e4b65a080a33e3955b3426","user":{"_id":"6790e2b74932687e24024b4a","avatarUrl":"/avatars/951f55648490e1f520483a3e425621dd.svg","isPro":false,"fullname":"Ruili","user":"RuiliFeng","type":"user"},"name":"Ruili Feng","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:51:03.191Z","hidden":false},{"_id":"67e4b65a080a33e3955b3427","name":"Shiwei Zhang","hidden":false},{"_id":"67e4b65a080a33e3955b3428","user":{"_id":"62bbf42ac9633b01802a6d45","avatarUrl":"/avatars/0fee1462d228f5e7f22d5c240900a3ad.svg","isPro":false,"fullname":"Siyang Sun","user":"sunsiyang","type":"user"},"name":"Siyang Sun","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:51:10.461Z","hidden":false},{"_id":"67e4b65a080a33e3955b3429","name":"Tao Fang","hidden":false},{"_id":"67e4b65a080a33e3955b342a","name":"Tianxing Wang","hidden":false},{"_id":"67e4b65a080a33e3955b342b","name":"Tianyi Gui","hidden":false},{"_id":"67e4b65a080a33e3955b342c","user":{"_id":"6489713d06a6dc54460725bb","avatarUrl":"/avatars/11b0e766eb8ccf67511e04a0c75e171e.svg","isPro":false,"fullname":"Tingyu Weng","user":"windmillknight","type":"user"},"name":"Tingyu Weng","status":"claimed_verified","statusLastChangedAt":"2025-04-03T13:33:13.496Z","hidden":false},{"_id":"67e4b65a080a33e3955b342d","name":"Tong Shen","hidden":false},{"_id":"67e4b65a080a33e3955b342e","name":"Wei Lin","hidden":false},{"_id":"67e4b65a080a33e3955b342f","name":"Wei Wang","hidden":false},{"_id":"67e4b65a080a33e3955b3430","name":"Wei Wang","hidden":false},{"_id":"67e4b65a080a33e3955b3431","user":{"_id":"623c6253389748c9f72ca287","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654828369523-623c6253389748c9f72ca287.jpeg","isPro":false,"fullname":"wenmeng zhou","user":"wenmengzhou","type":"user"},"name":"Wenmeng Zhou","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:51:38.310Z","hidden":false},{"_id":"67e4b65a080a33e3955b3432","user":{"_id":"644240b1251730a7ee243ef3","avatarUrl":"/avatars/c4ca99739e2b6f3d3d0ca83ecc54766a.svg","isPro":false,"fullname":"wente.wang","user":"shiftc","type":"user"},"name":"Wente Wang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:51:46.041Z","hidden":false},{"_id":"67e4b65a080a33e3955b3433","user":{"_id":"64af91eb5c17fe25cfcbebc3","avatarUrl":"/avatars/ffc6e7b6a40300e05e66f544264dddbc.svg","isPro":false,"fullname":"Wenting Shen","user":"SeventeenSSS","type":"user"},"name":"Wenting Shen","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:51:53.298Z","hidden":false},{"_id":"67e4b65a080a33e3955b3434","name":"Wenyuan Yu","hidden":false},{"_id":"67e4b65a080a33e3955b3435","user":{"_id":"642e19b26748dd4f8eea1321","avatarUrl":"/avatars/a534e61c21d2fb3c7a4c4d4dba98fafb.svg","isPro":false,"fullname":"Xianzhong Shi","user":"itutor","type":"user"},"name":"Xianzhong Shi","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:51:19.514Z","hidden":false},{"_id":"67e4b65a080a33e3955b3436","user":{"_id":"65105ab08c4b535a97052fe8","avatarUrl":"/avatars/a97862045a26a74ca33d1a47b6a1f2b4.svg","isPro":false,"fullname":"xiaominghuang","user":"xiaominghuang","type":"user"},"name":"Xiaoming Huang","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:52:03.599Z","hidden":false},{"_id":"67e4b65a080a33e3955b3437","name":"Xin Xu","hidden":false},{"_id":"67e4b65a080a33e3955b3438","name":"Yan Kou","hidden":false},{"_id":"67e4b65a080a33e3955b3439","name":"Yangyu Lv","hidden":false},{"_id":"67e4b65a080a33e3955b343a","name":"Yifei Li","hidden":false},{"_id":"67e4b65a080a33e3955b343b","user":{"_id":"67d39e61943a965360fbbc0c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/-JwILFmblPdd6Sv28c1J7.png","isPro":false,"fullname":"yijing liu","user":"86diphda","type":"user"},"name":"Yijing Liu","status":"admin_assigned","statusLastChangedAt":"2025-03-27T09:52:18.647Z","hidden":false},{"_id":"67e4b65a080a33e3955b343c","name":"Yiming Wang","hidden":false},{"_id":"67e4b65a080a33e3955b343d","name":"Yingya Zhang","hidden":false},{"_id":"67e4b65a080a33e3955b343e","name":"Yitong Huang","hidden":false},{"_id":"67e4b65a080a33e3955b343f","name":"Yong Li","hidden":false},{"_id":"67e4b65a080a33e3955b3440","name":"You Wu","hidden":false},{"_id":"67e4b65a080a33e3955b3441","name":"Yu Liu","hidden":false},{"_id":"67e4b65a080a33e3955b3442","name":"Yulin Pan","hidden":false},{"_id":"67e4b65a080a33e3955b3443","name":"Yun Zheng","hidden":false},{"_id":"67e4b65a080a33e3955b3444","name":"Yuntao Hong","hidden":false},{"_id":"67e4b65a080a33e3955b3445","name":"Yupeng Shi","hidden":false},{"_id":"67e4b65a080a33e3955b3446","name":"Yutong Feng","hidden":false},{"_id":"67e4b65a080a33e3955b3447","user":{"_id":"643d278b482011f5f2bd0fae","avatarUrl":"/avatars/70b8a7ffbfa2a1c4b6f5ff5e2b96b7bf.svg","isPro":false,"fullname":"jiangzeyinzi","user":"jiangzeyinzi","type":"user"},"name":"Zeyinzi Jiang","status":"claimed_verified","statusLastChangedAt":"2025-03-31T08:14:59.180Z","hidden":false},{"_id":"67e4b65a080a33e3955b3448","user":{"_id":"647ffabf28b737d7b9462eb2","avatarUrl":"/avatars/210441fc6645d08b36ad43734108f914.svg","isPro":false,"fullname":"Zhen Han","user":"hanzhn","type":"user"},"name":"Zhen Han","status":"claimed_verified","statusLastChangedAt":"2025-03-31T09:55:48.498Z","hidden":false},{"_id":"67e4b65a080a33e3955b3449","name":"Zhi-Fan Wu","hidden":false},{"_id":"67e4b65a080a33e3955b344a","name":"Ziyu Liu","hidden":false}],"publishedAt":"2025-03-26T08:25:43.000Z","submittedOnDailyAt":"2025-03-27T00:52:37.426Z","title":"Wan: Open and Advanced Large-Scale Video Generative Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"This report presents Wan, a comprehensive and open suite of video foundation\nmodels designed to push the boundaries of video generation. Built upon the\nmainstream diffusion transformer paradigm, Wan achieves significant\nadvancements in generative capabilities through a series of innovations,\nincluding our novel VAE, scalable pre-training strategies, large-scale data\ncuration, and automated evaluation metrics. These contributions collectively\nenhance the model's performance and versatility. Specifically, Wan is\ncharacterized by four key features: Leading Performance: The 14B model of Wan,\ntrained on a vast dataset comprising billions of images and videos,\ndemonstrates the scaling laws of video generation with respect to both data and\nmodel size. It consistently outperforms the existing open-source models as well\nas state-of-the-art commercial solutions across multiple internal and external\nbenchmarks, demonstrating a clear and significant performance superiority.\nComprehensiveness: Wan offers two capable models, i.e., 1.3B and 14B\nparameters, for efficiency and effectiveness respectively. It also covers\nmultiple downstream applications, including image-to-video, instruction-guided\nvideo editing, and personal video generation, encompassing up to eight tasks.\nConsumer-Grade Efficiency: The 1.3B model demonstrates exceptional resource\nefficiency, requiring only 8.19 GB VRAM, making it compatible with a wide range\nof consumer-grade GPUs. Openness: We open-source the entire series of Wan,\nincluding source code and all models, with the goal of fostering the growth of\nthe video generation community. This openness seeks to significantly expand the\ncreative possibilities of video production in the industry and provide academia\nwith high-quality video foundation models. All the code and models are\navailable at https://github.com/Wan-Video/Wan2.1.","upvotes":55,"discussionId":"67e4b663080a33e3955b371a","ai_summary":"Wan, a comprehensive suite of video foundation models built on the diffusion transformer paradigm, advannces video generation by introducing a novel VAE, scalable pre-training strategies, and large-scale data curation, offering superior performance and versatility across various applications with both large and efficient models.","ai_keywords":["diffusion transformer","VAE","scalable pre-training","large-scale data curation","video generation","image-to-video","instruction-guided video editing","personal video generation","VRAM"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66b02d0d05e2b2771bb36096","avatarUrl":"/avatars/db159dc5a150994e60bafbfbd7128658.svg","isPro":false,"fullname":"wtx","user":"shadowshadow","type":"user"},{"_id":"6342796a0875f2c99cfd313b","avatarUrl":"/avatars/98575092404c4197b20c929a6499a015.svg","isPro":false,"fullname":"Yuseung \"Phillip\" Lee","user":"phillipinseoul","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"646350107e9025b09bd62bab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646350107e9025b09bd62bab/TEOf1dZnZLE-4_-I6Eh-n.jpeg","isPro":false,"fullname":"momo","user":"wzc991222","type":"user"},{"_id":"66b2e9ed5409fe4fb5c7c24e","avatarUrl":"/avatars/c1fa329c0f9d26ee0e68d994b53f2679.svg","isPro":false,"fullname":"panda","user":"hughug774","type":"user"},{"_id":"64d4a5d19fda68f258679c83","avatarUrl":"/avatars/c958841d2432ba2698bd6bffa548cfcb.svg","isPro":false,"fullname":"Anw","user":"ang868","type":"user"},{"_id":"64a84de2eb47b3552285ef74","avatarUrl":"/avatars/114e0cc393d0aea9680f3af6d84d6f46.svg","isPro":false,"fullname":"Eni Grand","user":"Enigrand","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"643be8879f5d314db2d9ed23","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/643be8879f5d314db2d9ed23/VrW2UtJ7ppOnGIYjTWd7b.png","isPro":false,"fullname":"Chen Dongping","user":"shuaishuaicdp","type":"user"},{"_id":"651f8133dbf879b8c58f5136","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/651f8133dbf879b8c58f5136/0L8Ecgi5Ietkm_DchJwE-.png","isPro":false,"fullname":"Zikai Zhou","user":"Klayand","type":"user"},{"_id":"656832dfbd65fd41ee7aa8cd","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/656832dfbd65fd41ee7aa8cd/HHkyetTqNq1wIBPipzjQA.jpeg","isPro":false,"fullname":"Zekun Wang","user":"kugwzk","type":"user"},{"_id":"634dffc49b777beec3bc6448","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1670144568552-634dffc49b777beec3bc6448.jpeg","isPro":false,"fullname":"Zhipeng Yang","user":"svjack","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":2}">
Wan, a comprehensive suite of video foundation models built on the diffusion transformer paradigm, advannces video generation by introducing a novel VAE, scalable pre-training strategies, and large-scale data curation, offering superior performance and versatility across various applications with both large and efficient models.
AI-generated summary
This report presents Wan, a comprehensive and open suite of video foundation
models designed to push the boundaries of video generation. Built upon the
mainstream diffusion transformer paradigm, Wan achieves significant
advancements in generative capabilities through a series of innovations,
including our novel VAE, scalable pre-training strategies, large-scale data
curation, and automated evaluation metrics. These contributions collectively
enhance the model's performance and versatility. Specifically, Wan is
characterized by four key features: Leading Performance: The 14B model of Wan,
trained on a vast dataset comprising billions of images and videos,
demonstrates the scaling laws of video generation with respect to both data and
model size. It consistently outperforms the existing open-source models as well
as state-of-the-art commercial solutions across multiple internal and external
benchmarks, demonstrating a clear and significant performance superiority.
Comprehensiveness: Wan offers two capable models, i.e., 1.3B and 14B
parameters, for efficiency and effectiveness respectively. It also covers
multiple downstream applications, including image-to-video, instruction-guided
video editing, and personal video generation, encompassing up to eight tasks.
Consumer-Grade Efficiency: The 1.3B model demonstrates exceptional resource
efficiency, requiring only 8.19 GB VRAM, making it compatible with a wide range
of consumer-grade GPUs. Openness: We open-source the entire series of Wan,
including source code and all models, with the goal of fostering the growth of
the video generation community. This openness seeks to significantly expand the
creative possibilities of video production in the industry and provide academia
with high-quality video foundation models. All the code and models are
available at https://github.com/Wan-Video/Wan2.1.