\n","updatedAt":"2024-06-09T06:14:30.087Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5016201138496399},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.09334","authors":[{"_id":"65f3b141d09b5f817d5d85a6","user":{"_id":"6345b71843f4f2d2ed113355","avatarUrl":"/avatars/a497669a4c53a724c4f6ea615d1dda59.svg","isPro":false,"fullname":"Uriel Singer","user":"urielsinger","type":"user"},"name":"Uriel Singer","status":"claimed_verified","statusLastChangedAt":"2024-03-15T08:03:13.077Z","hidden":false},{"_id":"65f3b141d09b5f817d5d85a7","user":{"_id":"64d8cb34505306fcd2fb89c3","avatarUrl":"/avatars/4ae9ffe5d494a7d53bc519ff8e402b3e.svg","isPro":false,"fullname":"Amit Zohar","user":"amitz","type":"user"},"name":"Amit Zohar","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:17:19.261Z","hidden":false},{"_id":"65f3b141d09b5f817d5d85a8","user":{"_id":"604f82d33050a33ebb17ef65","avatarUrl":"/avatars/0727b7175a077b28035732352f5be171.svg","isPro":false,"fullname":"Yuval Kirstain","user":"yuvalkirstain","type":"user"},"name":"Yuval Kirstain","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:17:25.887Z","hidden":false},{"_id":"65f3b141d09b5f817d5d85a9","user":{"_id":"63468e6ca6f101c7f9049132","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665662111525-63468e6ca6f101c7f9049132.png","isPro":false,"fullname":"Shelly Sheynin","user":"shellysheynin","type":"user"},"name":"Shelly Sheynin","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:17:33.313Z","hidden":false},{"_id":"65f3b141d09b5f817d5d85aa","user":{"_id":"6304d514dae2eb7d08413d62","avatarUrl":"/avatars/02a571bc791b78d3993d9a0484b70a29.svg","isPro":false,"fullname":"Adam Polyak","user":"adampo","type":"user"},"name":"Adam Polyak","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:17:40.192Z","hidden":false},{"_id":"65f3b141d09b5f817d5d85ab","user":{"_id":"6340753995c20b9447379c67","avatarUrl":"/avatars/f793c00a98335cd8f7efce41d403da16.svg","isPro":false,"fullname":"Devi Parikh","user":"deviparikh-genai","type":"user"},"name":"Devi Parikh","status":"admin_assigned","statusLastChangedAt":"2024-03-15T08:17:48.027Z","hidden":false},{"_id":"65f3b141d09b5f817d5d85ac","name":"Yaniv Taigman","hidden":false}],"publishedAt":"2024-03-14T12:22:54.000Z","submittedOnDailyAt":"2024-03-15T00:54:02.889Z","title":"Video Editing via Factorized Diffusion Distillation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We introduce Emu Video Edit (EVE), a model that establishes a new\nstate-of-the art in video editing without relying on any supervised video\nediting data. To develop EVE we separately train an image editing adapter and a\nvideo generation adapter, and attach both to the same text-to-image model.\nThen, to align the adapters towards video editing we introduce a new\nunsupervised distillation procedure, Factorized Diffusion Distillation. This\nprocedure distills knowledge from one or more teachers simultaneously, without\nany supervised data. We utilize this procedure to teach EVE to edit videos by\njointly distilling knowledge to (i) precisely edit each individual frame from\nthe image editing adapter, and (ii) ensure temporal consistency among the\nedited frames using the video generation adapter. Finally, to demonstrate the\npotential of our approach in unlocking other capabilities, we align additional\ncombinations of adapters","upvotes":23,"discussionId":"65f3b142d09b5f817d5d862c","ai_summary":"Emu Video Edit (EVE) uses unsupervised distillation, specifically Factorized Diffusion Distillation, to perform video editing by aligning image editing and video generation adapters without needing supervised data.","ai_keywords":["text-to-image model","Factorized Diffusion Distillation"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"659180299e16fa7510840ac4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/659180299e16fa7510840ac4/I66LcNr3i35ehuzw1vR2Q.png","isPro":false,"fullname":"Ji-Ha","user":"Ji-Ha","type":"user"},{"_id":"63d4c8ce13ae45b780792f32","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675778487155-63d4c8ce13ae45b780792f32.jpeg","isPro":false,"fullname":"Ohenenoo","user":"PeepDaSlan9","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6362ddb7d3be91534c30bfd6","avatarUrl":"/avatars/dac76ebd3b8a08099497ec0b0524bc7c.svg","isPro":false,"fullname":"Art Atk","user":"ArtAtk","type":"user"},{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","isPro":false,"fullname":"Lee Gao","user":"leegao19","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"63ddc7b80f6d2d6c3efe3600","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ddc7b80f6d2d6c3efe3600/RX5q9T80Jl3tn6z03ls0l.jpeg","isPro":false,"fullname":"J","user":"dashfunnydashdash","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"6407409a4dc5f2846c952526","avatarUrl":"/avatars/5dbc2f4d1aafeb9db7c6c62b11055100.svg","isPro":false,"fullname":"Yuruzuu","user":"Yuruzuu","type":"user"},{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Emu Video Edit (EVE) uses unsupervised distillation, specifically Factorized Diffusion Distillation, to perform video editing by aligning image editing and video generation adapters without needing supervised data.
AI-generated summary
We introduce Emu Video Edit (EVE), a model that establishes a new
state-of-the art in video editing without relying on any supervised video
editing data. To develop EVE we separately train an image editing adapter and a
video generation adapter, and attach both to the same text-to-image model.
Then, to align the adapters towards video editing we introduce a new
unsupervised distillation procedure, Factorized Diffusion Distillation. This
procedure distills knowledge from one or more teachers simultaneously, without
any supervised data. We utilize this procedure to teach EVE to edit videos by
jointly distilling knowledge to (i) precisely edit each individual frame from
the image editing adapter, and (ii) ensure temporal consistency among the
edited frames using the video generation adapter. Finally, to demonstrate the
potential of our approach in unlocking other capabilities, we align additional
combinations of adapters