Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-03-09T01:19:56.948Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7066870927810669},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.04634","authors":[{"_id":"65eaa2e62bcb457b2fd61622","user":{"_id":"642c17428f90c557f741c188","avatarUrl":"/avatars/3577c59b73f0b2904a63a976663b74fb.svg","isPro":false,"fullname":"Hitesh Kandala","user":"v1an1","type":"user"},"name":"Hitesh Kandala","status":"admin_assigned","statusLastChangedAt":"2024-03-08T09:52:17.246Z","hidden":false},{"_id":"65eaa2e62bcb457b2fd61623","user":{"_id":"641904caf9d6f1d772ec7af7","avatarUrl":"/avatars/4a63eac71eb30f70b1a0e9d4708f26c1.svg","isPro":false,"fullname":"Jianfeng Gao","user":"wyngjf","type":"user"},"name":"Jianfeng Gao","status":"admin_assigned","statusLastChangedAt":"2024-03-08T09:52:33.583Z","hidden":true},{"_id":"65eaa2e62bcb457b2fd61624","user":{"_id":"6125df7f25027fb1ea9c7a41","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1629871954341-noauth.jpeg","isPro":false,"fullname":"Jianwei Yang","user":"jw2yang","type":"user"},"name":"Jianwei Yang","status":"extracted_pending","statusLastChangedAt":"2024-03-08T05:32:24.143Z","hidden":false}],"publishedAt":"2024-03-07T16:18:28.000Z","submittedOnDailyAt":"2024-03-08T03:02:24.163Z","title":"Pix2Gif: Motion-Guided Diffusion for GIF Generation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video)\ngeneration. We tackle this problem differently by formulating the task as an\nimage translation problem steered by text and motion magnitude prompts, as\nshown in teaser fig. To ensure that the model adheres to motion guidance, we\npropose a new motion-guided warping module to spatially transform the features\nof the source image conditioned on the two types of prompts. Furthermore, we\nintroduce a perceptual loss to ensure the transformed feature map remains\nwithin the same space as the target image, ensuring content consistency and\ncoherence. In preparation for the model training, we meticulously curated data\nby extracting coherent image frames from the TGIF video-caption dataset, which\nprovides rich information about the temporal changes of subjects. After\npretraining, we apply our model in a zero-shot manner to a number of video\ndatasets. Extensive qualitative and quantitative experiments demonstrate the\neffectiveness of our model -- it not only captures the semantic prompt from\ntext but also the spatial ones from motion guidance. We train all our models\nusing a single node of 16xV100 GPUs. Code, dataset and models are made public\nat: https://hiteshk03.github.io/Pix2Gif/.","upvotes":18,"discussionId":"65eaa2e82bcb457b2fd6167f","ai_summary":"Pix2Gif, a motion-guided diffusion model, generates image-to-GIF videos using text and motion prompts, ensuring coherence and consistency through a perceptual loss and motion-guided warping module.","ai_keywords":["motion-guided diffusion model","image translation","motion-guided warping module","perceptual loss","TGIF video-caption dataset","zero-shot","diffusion model"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"63044b2e1dd5d3c624882d19","avatarUrl":"/avatars/ba4d387547d1d0baeca918caea680f89.svg","isPro":false,"fullname":"Patrick Kwon","user":"yj7082126","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"62aaaaf55a99fb2669bcd0e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1655352046059-noauth.jpeg","isPro":false,"fullname":"GaggiX","user":"GaggiX","type":"user"},{"_id":"658f21c3ccbc1e2cc7aafc99","avatarUrl":"/avatars/3304cc8138082adef31322f9bdf0c50f.svg","isPro":false,"fullname":"Hitesh Kandala","user":"Viani","type":"user"},{"_id":"642c17428f90c557f741c188","avatarUrl":"/avatars/3577c59b73f0b2904a63a976663b74fb.svg","isPro":false,"fullname":"Hitesh Kandala","user":"v1an1","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"63971d6a4352e45362dd6c9c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63971d6a4352e45362dd6c9c/Y5Sh1gF_CB3FdseZ9KBMP.jpeg","isPro":false,"fullname":"Ciprian Cimpan","user":"ciprian42","type":"user"},{"_id":"631a8603fac58c9c81648f04","avatarUrl":"/avatars/e197fdb2a6b8b66ce3b4018a98e8104f.svg","isPro":false,"fullname":"Erik","user":"worstpractice","type":"user"},{"_id":"6125df7f25027fb1ea9c7a41","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1629871954341-noauth.jpeg","isPro":false,"fullname":"Jianwei Yang","user":"jw2yang","type":"user"},{"_id":"65676a0a461af93fca9f2329","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65676a0a461af93fca9f2329/-CB4C1C6yLM4gRU2K5gsS.jpeg","isPro":false,"fullname":"Juan Delgadillo","user":"juandelgadillo","type":"user"},{"_id":"6512e3e542a541c175220416","avatarUrl":"/avatars/0df68a43ca072b0eb9726337849dff5e.svg","isPro":false,"fullname":"Utkarsh Agarwal","user":"utkarshagarwal","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Pix2Gif, a motion-guided diffusion model, generates image-to-GIF videos using text and motion prompts, ensuring coherence and consistency through a perceptual loss and motion-guided warping module.
AI-generated summary
We present Pix2Gif, a motion-guided diffusion model for image-to-GIF (video)
generation. We tackle this problem differently by formulating the task as an
image translation problem steered by text and motion magnitude prompts, as
shown in teaser fig. To ensure that the model adheres to motion guidance, we
propose a new motion-guided warping module to spatially transform the features
of the source image conditioned on the two types of prompts. Furthermore, we
introduce a perceptual loss to ensure the transformed feature map remains
within the same space as the target image, ensuring content consistency and
coherence. In preparation for the model training, we meticulously curated data
by extracting coherent image frames from the TGIF video-caption dataset, which
provides rich information about the temporal changes of subjects. After
pretraining, we apply our model in a zero-shot manner to a number of video
datasets. Extensive qualitative and quantitative experiments demonstrate the
effectiveness of our model -- it not only captures the semantic prompt from
text but also the spatial ones from motion guidance. We train all our models
using a single node of 16xV100 GPUs. Code, dataset and models are made public
at: https://hiteshk03.github.io/Pix2Gif/.