\n","updatedAt":"2024-01-31T05:07:47.717Z","author":{"_id":"6388f560c0b95b2c2937f4eb","avatarUrl":"/avatars/b04823aed97d0673e54e4fefa22b4284.svg","fullname":"Natchapol Bootmee ","name":"Natchapolb","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.21816310286521912},"editors":["Natchapolb"],"editorAvatarUrls":["/avatars/b04823aed97d0673e54e4fefa22b4284.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.15708","authors":[{"_id":"65b87996cc235dfda331c3b3","name":"Jianxiang Lu","hidden":false},{"_id":"65b87996cc235dfda331c3b4","user":{"_id":"6353ef7a74026d8c3af88db6","avatarUrl":"/avatars/7b1664493cfcd4e62cbd4fc23b2f2bf6.svg","isPro":false,"fullname":"chloe","user":"iue","type":"user"},"name":"Cong Xie","status":"extracted_pending","statusLastChangedAt":"2024-01-30T04:22:47.958Z","hidden":false},{"_id":"65b87996cc235dfda331c3b5","user":{"_id":"65e68d421da1fd85db9bd44a","avatarUrl":"/avatars/8f02ebd7a7bdab8386ff6962ab3b90ac.svg","isPro":false,"fullname":"BestPerformanceGuo","user":"BestPerformance","type":"user"},"name":"Hui Guo","status":"extracted_pending","statusLastChangedAt":"2024-03-05T03:13:35.430Z","hidden":false}],"publishedAt":"2024-01-28T17:11:42.000Z","submittedOnDailyAt":"2024-01-30T01:52:47.983Z","title":"Object-Driven One-Shot Fine-tuning of Text-to-Image Diffusion with\n Prototypical Embedding","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"As large-scale text-to-image generation models have made remarkable progress\nin the field of text-to-image generation, many fine-tuning methods have been\nproposed. However, these models often struggle with novel objects, especially\nwith one-shot scenarios. Our proposed method aims to address the challenges of\ngeneralizability and fidelity in an object-driven way, using only a single\ninput image and the object-specific regions of interest. To improve\ngeneralizability and mitigate overfitting, in our paradigm, a prototypical\nembedding is initialized based on the object's appearance and its class, before\nfine-tuning the diffusion model. And during fine-tuning, we propose a\nclass-characterizing regularization to preserve prior knowledge of object\nclasses. To further improve fidelity, we introduce object-specific loss, which\ncan also use to implant multiple objects. Overall, our proposed object-driven\nmethod for implanting new objects can integrate seamlessly with existing\nconcepts as well as with high fidelity and generalization. Our method\noutperforms several existing works. The code will be released.","upvotes":12,"discussionId":"65b87997cc235dfda331c3f1","ai_summary":"A novel fine-tuning method using prototypes and class-characterizing regularization improves object-driven generalization and fidelity in text-to-image generation models.","ai_keywords":["fine-tuning","prototypical embedding","diffusion model","class-characterizing regularization","object-specific loss"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6266513d539521e602b5dc3a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6266513d539521e602b5dc3a/7ZU_GyMBzrFHcHDoAkQlp.png","isPro":false,"fullname":"Ameer Azam","user":"ameerazam08","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"61868ce808aae0b5499a2a95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg","isPro":true,"fullname":"Sylvain Filoni","user":"fffiloni","type":"user"},{"_id":"62ce9520c745e38e081a804c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62ce9520c745e38e081a804c/nQzWRKCIyEona2utLzc5A.jpeg","isPro":false,"fullname":"biaggi","user":"biaggi","type":"user"},{"_id":"6079cc1c65b9d0165cb18394","avatarUrl":"/avatars/8c1f1011d9f675fc899919cf07faef68.svg","isPro":false,"fullname":"Chris Lesniewski","user":"lesniewski","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"6111ae7734f04c50afbed34b","avatarUrl":"/avatars/790f08bb65a84a33dd1ec478a79dd130.svg","isPro":false,"fullname":"Justin","user":"vltmedia","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/DxjZG1XT4H3ZHF7qHxWxk.jpeg","isPro":true,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"6798fe5985310ecdffc5f530","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6798fe5985310ecdffc5f530/9M7u5mVwhmhBdJ6-eUV8g.jpeg","isPro":false,"fullname":"LuoHe","user":"SuKIIII2","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
A novel fine-tuning method using prototypes and class-characterizing regularization improves object-driven generalization and fidelity in text-to-image generation models.
AI-generated summary
As large-scale text-to-image generation models have made remarkable progress
in the field of text-to-image generation, many fine-tuning methods have been
proposed. However, these models often struggle with novel objects, especially
with one-shot scenarios. Our proposed method aims to address the challenges of
generalizability and fidelity in an object-driven way, using only a single
input image and the object-specific regions of interest. To improve
generalizability and mitigate overfitting, in our paradigm, a prototypical
embedding is initialized based on the object's appearance and its class, before
fine-tuning the diffusion model. And during fine-tuning, we propose a
class-characterizing regularization to preserve prior knowledge of object
classes. To further improve fidelity, we introduce object-specific loss, which
can also use to implant multiple objects. Overall, our proposed object-driven
method for implanting new objects can integrate seamlessly with existing
concepts as well as with high fidelity and generalization. Our method
outperforms several existing works. The code will be released.