lynx   »   [go: up one dir, main page]

\"Uploading

\n","updatedAt":"2024-08-31T00:53:59.173Z","author":{"_id":"66d264228be9520f3e2d8694","avatarUrl":"/avatars/e39030ac7848f712aea06ab11e662f36.svg","fullname":"Frot","name":"Dee11111","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":1}},"numEdits":0,"identifiedLanguage":{"language":"de","probability":0.18040362000465393},"editors":["Dee11111"],"editorAvatarUrls":["/avatars/e39030ac7848f712aea06ab11e662f36.svg"],"reactions":[{"reaction":"➕","users":["Dee11111"],"count":1}],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.06269","authors":[{"_id":"65f93819b49d94e09404c525","name":"Youyuan Zhang","hidden":false},{"_id":"65f93819b49d94e09404c526","name":"Xuan Ju","hidden":false},{"_id":"65f93819b49d94e09404c527","name":"James J. Clark","hidden":false}],"publishedAt":"2024-03-10T17:12:01.000Z","title":"FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video\n Editing","summary":"Diffusion models have demonstrated remarkable capabilities in text-to-image\nand text-to-video generation, opening up possibilities for video editing based\non textual input. However, the computational cost associated with sequential\nsampling in diffusion models poses challenges for efficient video editing.\nExisting approaches relying on image generation models for video editing suffer\nfrom time-consuming one-shot fine-tuning, additional condition extraction, or\nDDIM inversion, making real-time applications impractical. In this work, we\npropose FastVideoEdit, an efficient zero-shot video editing approach inspired\nby Consistency Models (CMs). By leveraging the self-consistency property of\nCMs, we eliminate the need for time-consuming inversion or additional condition\nextraction, reducing editing time. Our method enables direct mapping from\nsource video to target video with strong preservation ability utilizing a\nspecial variance schedule. This results in improved speed advantages, as fewer\nsampling steps can be used while maintaining comparable generation quality.\nExperimental results validate the state-of-the-art performance and speed\nadvantages of FastVideoEdit across evaluation metrics encompassing editing\nspeed, temporal consistency, and text-video alignment.","upvotes":0,"discussionId":"65f9381bb49d94e09404c567","ai_summary":"FastVideoEdit is an efficient zero-shot video editing method leveraging Consistency Models to achieve fast and high-quality video editing with strong preservation ability and improved speed.","ai_keywords":["diffusion models","text-to-image generation","text-to-video generation","video editing","computational cost","sequential sampling","image generation models","one-shot fine-tuning","condition extraction","Consistency Models","self-consistency property","variance schedule","editing speed","temporal consistency","text-video alignment"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["*"]}">
Papers
arxiv:2403.06269

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

Published on Mar 10, 2024
Authors:
,
,

Abstract

FastVideoEdit is an efficient zero-shot video editing method leveraging Consistency Models to achieve fast and high-quality video editing with strong preservation ability and improved speed.

AI-generated summary

Diffusion models have demonstrated remarkable capabilities in text-to-image and text-to-video generation, opening up possibilities for video editing based on textual input. However, the computational cost associated with sequential sampling in diffusion models poses challenges for efficient video editing. Existing approaches relying on image generation models for video editing suffer from time-consuming one-shot fine-tuning, additional condition extraction, or DDIM inversion, making real-time applications impractical. In this work, we propose FastVideoEdit, an efficient zero-shot video editing approach inspired by Consistency Models (CMs). By leveraging the self-consistency property of CMs, we eliminate the need for time-consuming inversion or additional condition extraction, reducing editing time. Our method enables direct mapping from source video to target video with strong preservation ability utilizing a special variance schedule. This results in improved speed advantages, as fewer sampling steps can be used while maintaining comparable generation quality. Experimental results validate the state-of-the-art performance and speed advantages of FastVideoEdit across evaluation metrics encompassing editing speed, temporal consistency, and text-video alignment.

Community

Uploading tiktokio.com_4vO1FysikwxxJnnXtR8C.mp4…

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2403.06269 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.06269 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.06269 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.
Лучший частный хостинг