https://github.com/KaiyueSun98/T2I-Personalization-with-AR

\n","updatedAt":"2025-04-22T18:06:45.679Z","author":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","fullname":"Kaiyue Sun","name":"Kaiyue","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8083507418632507},"editors":["Kaiyue"],"editorAvatarUrls":["/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg"],"reactions":[],"isReport":false}},{"id":"6808d2e9971061fe065ae6a7","author":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","fullname":"Kaiyue Sun","name":"Kaiyue","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3},"createdAt":"2025-04-23T11:45:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.\r\nGithub: https://github.com/KaiyueSun98/T2I-Personalization-with-AR\r\n","html":"
This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.
Github: https://github.com/KaiyueSun98/T2I-Personalization-with-AR

\n","updatedAt":"2025-04-23T11:45:45.126Z","author":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","fullname":"Kaiyue Sun","name":"Kaiyue","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8200172185897827},"editors":["Kaiyue"],"editorAvatarUrls":["/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg"],"reactions":[],"isReport":false}},{"id":"680995443714130a45020399","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-04-24T01:35:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Proxy-Tuning: Tailoring Multimodal Autoregressive Models for Subject-Driven Image Generation](https://huggingface.co/papers/2503.10125) (2025)\n* [Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation](https://huggingface.co/papers/2504.02612) (2025)\n* [Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias](https://huggingface.co/papers/2503.06632) (2025)\n* [ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation](https://huggingface.co/papers/2503.10358) (2025)\n* [FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement](https://huggingface.co/papers/2504.03292) (2025)\n* [Transfer between Modalities with MetaQueries](https://huggingface.co/papers/2504.06256) (2025)\n* [Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think](https://huggingface.co/papers/2502.20172) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
\n
\n
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-04-24T01:35:00.491Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7067216634750366},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2504.13162","authors":[{"_id":"6801ba19bd3257c552d201ea","user":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","isPro":false,"fullname":"Kaiyue Sun","user":"Kaiyue","type":"user"},"name":"Kaiyue Sun","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:51:20.826Z","hidden":false},{"_id":"6801ba19bd3257c552d201eb","user":{"_id":"65f8f638cb0325b3826fed67","avatarUrl":"/avatars/e8fbcdb0445c8b43bd6dc8e84185a588.svg","isPro":false,"fullname":"Xian Liu","user":"xianliu","type":"user"},"name":"Xian Liu","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:51:26.870Z","hidden":false},{"_id":"6801ba19bd3257c552d201ec","name":"Yao Teng","hidden":false},{"_id":"6801ba19bd3257c552d201ed","user":{"_id":"65d5ec74cd05bc1eaa125040","avatarUrl":"/avatars/2de1b1539a86452c2c89570eeb02f5ab.svg","isPro":false,"fullname":"Xihui Liu","user":"XihuiLiu","type":"user"},"name":"Xihui Liu","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:51:32.976Z","hidden":false}],"publishedAt":"2025-04-17T17:58:26.000Z","submittedOnDailyAt":"2025-04-23T10:15:45.119Z","title":"Personalized Text-to-Image Generation with Auto-Regressive Models","submittedOnDailyBy":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","isPro":false,"fullname":"Kaiyue Sun","user":"Kaiyue","type":"user"},"summary":"Personalized image synthesis has emerged as a pivotal application in\ntext-to-image generation, enabling the creation of images featuring specific\nsubjects in diverse contexts. While diffusion models have dominated this\ndomain, auto-regressive models, with their unified architecture for text and\nimage modeling, remain underexplored for personalized image generation. This\npaper investigates the potential of optimizing auto-regressive models for\npersonalized image synthesis, leveraging their inherent multimodal capabilities\nto perform this task. We propose a two-stage training strategy that combines\noptimization of text embeddings and fine-tuning of transformer layers. Our\nexperiments on the auto-regressive model demonstrate that this method achieves\ncomparable subject fidelity and prompt following to the leading diffusion-based\npersonalization methods. The results highlight the effectiveness of\nauto-regressive models in personalized image generation, offering a new\ndirection for future research in this area.","upvotes":18,"discussionId":"6801ba1dbd3257c552d202fd","ai_summary":"Auto-regressive models achieve comparable performance to diffusion models in personalized image synthesis through a two-stage training strategy that optimizes text embeddings and fine-tunes transformer layers.","ai_keywords":["auto-regressive models","diffusion models","personalized image synthesis","text embeddings","transformer layers","subject fidelity","prompt following"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","isPro":false,"fullname":"Kaiyue Sun","user":"Kaiyue","type":"user"},{"_id":"64b4eecf2fc8324fcb63b404","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b4eecf2fc8324fcb63b404/zGYqYVB4-o-GBMybJ8CDA.png","isPro":false,"fullname":"Yunhan Yang","user":"yhyang-myron","type":"user"},{"_id":"637cba13b8e573d75be96ea6","avatarUrl":"/avatars/5eca230e63d66947b2a05c1ff964a96c.svg","isPro":false,"fullname":"Nina","user":"NinaKarine","type":"user"},{"_id":"6440fc05603214724eba4766","avatarUrl":"/avatars/1a82a3361c96ba7bfd429dbd3e6f0bad.svg","isPro":false,"fullname":"weimeng","user":"mengwei0427","type":"user"},{"_id":"64105a6d14215c0775dfdd14","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64105a6d14215c0775dfdd14/-VX-cUYOLjHIg7QnWhRGG.jpeg","isPro":false,"fullname":"Jiwen Yu","user":"VictorYuki","type":"user"},{"_id":"60d045c4778bafd0fbcfa3f5","avatarUrl":"/avatars/0cc0c2739c1934430ea09df7e9668c80.svg","isPro":false,"fullname":"Yi Chen","user":"ChenYi99","type":"user"},{"_id":"668125557b50b433cda2a211","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/668125557b50b433cda2a211/j3z3wT5Rv9IyUKtbzQpnc.png","isPro":false,"fullname":"Tianwei Xiong","user":"YuuTennYi","type":"user"},{"_id":"6427e08288215cee63b1c44d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6427e08288215cee63b1c44d/rzaG978FF-ywzicWNl_xl.jpeg","isPro":false,"fullname":"yao teng","user":"tytyt","type":"user"},{"_id":"638ee900ee7e45e0474a5712","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638ee900ee7e45e0474a5712/KLli_eCbWwffKR7oLDmV3.jpeg","isPro":false,"fullname":"Yukun Huang","user":"KevinHuang","type":"user"},{"_id":"672a037c19f1f942483f680c","avatarUrl":"/avatars/a48464044e9eb11a2bc062be05d9aa9a.svg","isPro":false,"fullname":"qiulu","user":"qiulu66","type":"user"},{"_id":"63ea23b9dedfeebe54d02bdf","avatarUrl":"/avatars/4d9f9a546aa8c63e277161ea700075c4.svg","isPro":false,"fullname":"Yuqing Wang","user":"Epiphqny","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Personalized Text-to-Image Generation with Auto-Regressive Models
Abstract
Auto-regressive models achieve comparable performance to diffusion models in personalized image synthesis through a two-stage training strategy that optimizes text embeddings and fine-tunes transformer layers.
Personalized image synthesis has emerged as a pivotal application in
text-to-image generation, enabling the creation of images featuring specific
subjects in diverse contexts. While diffusion models have dominated this
domain, auto-regressive models, with their unified architecture for text and
image modeling, remain underexplored for personalized image generation. This
paper investigates the potential of optimizing auto-regressive models for
personalized image synthesis, leveraging their inherent multimodal capabilities
to perform this task. We propose a two-stage training strategy that combines
optimization of text embeddings and fine-tuning of transformer layers. Our
experiments on the auto-regressive model demonstrate that this method achieves
comparable subject fidelity and prompt following to the leading diffusion-based
personalization methods. The results highlight the effectiveness of
auto-regressive models in personalized image generation, offering a new
direction for future research in this area.