lynx   »   [go: up one dir, main page]

https://github.com/KaiyueSun98/T2I-Personalization-with-AR
\"截屏2025-04-23

\n","updatedAt":"2025-04-22T18:06:45.679Z","author":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","fullname":"Kaiyue Sun","name":"Kaiyue","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8083507418632507},"editors":["Kaiyue"],"editorAvatarUrls":["/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg"],"reactions":[],"isReport":false}},{"id":"6808d2e9971061fe065ae6a7","author":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","fullname":"Kaiyue Sun","name":"Kaiyue","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3},"createdAt":"2025-04-23T11:45:45.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.\r\nGithub: https://github.com/KaiyueSun98/T2I-Personalization-with-AR\r\n![Yj_WRqu9KbJitKeEoRNDu.png](https://cdn-uploads.huggingface.co/production/uploads/63640ce5ff4b318d1b7b6f5c/vXMPMTqQtClqLBCHo8bwV.png)","html":"

This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.
Github: https://github.com/KaiyueSun98/T2I-Personalization-with-AR
\"Yj_WRqu9KbJitKeEoRNDu.png\"

\n","updatedAt":"2025-04-23T11:45:45.126Z","author":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","fullname":"Kaiyue Sun","name":"Kaiyue","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8200172185897827},"editors":["Kaiyue"],"editorAvatarUrls":["/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg"],"reactions":[],"isReport":false}},{"id":"680995443714130a45020399","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-04-24T01:35:00.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Proxy-Tuning: Tailoring Multimodal Autoregressive Models for Subject-Driven Image Generation](https://huggingface.co/papers/2503.10125) (2025)\n* [Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation](https://huggingface.co/papers/2504.02612) (2025)\n* [Towards More Accurate Personalized Image Generation: Addressing Overfitting and Evaluation Bias](https://huggingface.co/papers/2503.06632) (2025)\n* [ConceptGuard: Continual Personalized Text-to-Image Generation with Forgetting and Confusion Mitigation](https://huggingface.co/papers/2503.10358) (2025)\n* [FaR: Enhancing Multi-Concept Text-to-Image Diffusion via Concept Fusion and Localized Refinement](https://huggingface.co/papers/2504.03292) (2025)\n* [Transfer between Modalities with MetaQueries](https://huggingface.co/papers/2504.06256) (2025)\n* [Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think](https://huggingface.co/papers/2502.20172) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-04-24T01:35:00.491Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7067216634750366},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2504.13162","authors":[{"_id":"6801ba19bd3257c552d201ea","user":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","isPro":false,"fullname":"Kaiyue Sun","user":"Kaiyue","type":"user"},"name":"Kaiyue Sun","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:51:20.826Z","hidden":false},{"_id":"6801ba19bd3257c552d201eb","user":{"_id":"65f8f638cb0325b3826fed67","avatarUrl":"/avatars/e8fbcdb0445c8b43bd6dc8e84185a588.svg","isPro":false,"fullname":"Xian Liu","user":"xianliu","type":"user"},"name":"Xian Liu","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:51:26.870Z","hidden":false},{"_id":"6801ba19bd3257c552d201ec","name":"Yao Teng","hidden":false},{"_id":"6801ba19bd3257c552d201ed","user":{"_id":"65d5ec74cd05bc1eaa125040","avatarUrl":"/avatars/2de1b1539a86452c2c89570eeb02f5ab.svg","isPro":false,"fullname":"Xihui Liu","user":"XihuiLiu","type":"user"},"name":"Xihui Liu","status":"admin_assigned","statusLastChangedAt":"2025-04-23T13:51:32.976Z","hidden":false}],"publishedAt":"2025-04-17T17:58:26.000Z","submittedOnDailyAt":"2025-04-23T10:15:45.119Z","title":"Personalized Text-to-Image Generation with Auto-Regressive Models","submittedOnDailyBy":{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","isPro":false,"fullname":"Kaiyue Sun","user":"Kaiyue","type":"user"},"summary":"Personalized image synthesis has emerged as a pivotal application in\ntext-to-image generation, enabling the creation of images featuring specific\nsubjects in diverse contexts. While diffusion models have dominated this\ndomain, auto-regressive models, with their unified architecture for text and\nimage modeling, remain underexplored for personalized image generation. This\npaper investigates the potential of optimizing auto-regressive models for\npersonalized image synthesis, leveraging their inherent multimodal capabilities\nto perform this task. We propose a two-stage training strategy that combines\noptimization of text embeddings and fine-tuning of transformer layers. Our\nexperiments on the auto-regressive model demonstrate that this method achieves\ncomparable subject fidelity and prompt following to the leading diffusion-based\npersonalization methods. The results highlight the effectiveness of\nauto-regressive models in personalized image generation, offering a new\ndirection for future research in this area.","upvotes":18,"discussionId":"6801ba1dbd3257c552d202fd","ai_summary":"Auto-regressive models achieve comparable performance to diffusion models in personalized image synthesis through a two-stage training strategy that optimizes text embeddings and fine-tunes transformer layers.","ai_keywords":["auto-regressive models","diffusion models","personalized image synthesis","text embeddings","transformer layers","subject fidelity","prompt following"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63640ce5ff4b318d1b7b6f5c","avatarUrl":"/avatars/9336d1dab1491f3c8e84b1ea287eb891.svg","isPro":false,"fullname":"Kaiyue Sun","user":"Kaiyue","type":"user"},{"_id":"64b4eecf2fc8324fcb63b404","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b4eecf2fc8324fcb63b404/zGYqYVB4-o-GBMybJ8CDA.png","isPro":false,"fullname":"Yunhan Yang","user":"yhyang-myron","type":"user"},{"_id":"637cba13b8e573d75be96ea6","avatarUrl":"/avatars/5eca230e63d66947b2a05c1ff964a96c.svg","isPro":false,"fullname":"Nina","user":"NinaKarine","type":"user"},{"_id":"6440fc05603214724eba4766","avatarUrl":"/avatars/1a82a3361c96ba7bfd429dbd3e6f0bad.svg","isPro":false,"fullname":"weimeng","user":"mengwei0427","type":"user"},{"_id":"64105a6d14215c0775dfdd14","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64105a6d14215c0775dfdd14/-VX-cUYOLjHIg7QnWhRGG.jpeg","isPro":false,"fullname":"Jiwen Yu","user":"VictorYuki","type":"user"},{"_id":"60d045c4778bafd0fbcfa3f5","avatarUrl":"/avatars/0cc0c2739c1934430ea09df7e9668c80.svg","isPro":false,"fullname":"Yi Chen","user":"ChenYi99","type":"user"},{"_id":"668125557b50b433cda2a211","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/668125557b50b433cda2a211/j3z3wT5Rv9IyUKtbzQpnc.png","isPro":false,"fullname":"Tianwei Xiong","user":"YuuTennYi","type":"user"},{"_id":"6427e08288215cee63b1c44d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6427e08288215cee63b1c44d/rzaG978FF-ywzicWNl_xl.jpeg","isPro":false,"fullname":"yao teng","user":"tytyt","type":"user"},{"_id":"638ee900ee7e45e0474a5712","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638ee900ee7e45e0474a5712/KLli_eCbWwffKR7oLDmV3.jpeg","isPro":false,"fullname":"Yukun Huang","user":"KevinHuang","type":"user"},{"_id":"672a037c19f1f942483f680c","avatarUrl":"/avatars/a48464044e9eb11a2bc062be05d9aa9a.svg","isPro":false,"fullname":"qiulu","user":"qiulu66","type":"user"},{"_id":"63ea23b9dedfeebe54d02bdf","avatarUrl":"/avatars/4d9f9a546aa8c63e277161ea700075c4.svg","isPro":false,"fullname":"Yuqing Wang","user":"Epiphqny","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2504.13162

Personalized Text-to-Image Generation with Auto-Regressive Models

Published on Apr 17
· Submitted by Kaiyue Sun on Apr 23
Authors:
,

Abstract

Auto-regressive models achieve comparable performance to diffusion models in personalized image synthesis through a two-stage training strategy that optimizes text embeddings and fine-tunes transformer layers.

AI-generated summary

Personalized image synthesis has emerged as a pivotal application in text-to-image generation, enabling the creation of images featuring specific subjects in diverse contexts. While diffusion models have dominated this domain, auto-regressive models, with their unified architecture for text and image modeling, remain underexplored for personalized image generation. This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.

Community

Paper author Paper submitter

This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.
Github: https://github.com/KaiyueSun98/T2I-Personalization-with-AR
截屏2025-04-23 上午2.05.43.png

Paper author Paper submitter

This paper investigates the potential of optimizing auto-regressive models for personalized image synthesis, leveraging their inherent multimodal capabilities to perform this task. We propose a two-stage training strategy that combines optimization of text embeddings and fine-tuning of transformer layers. Our experiments on the auto-regressive model demonstrate that this method achieves comparable subject fidelity and prompt following to the leading diffusion-based personalization methods. The results highlight the effectiveness of auto-regressive models in personalized image generation, offering a new direction for future research in this area.
Github: https://github.com/KaiyueSun98/T2I-Personalization-with-AR
Yj_WRqu9KbJitKeEoRNDu.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.13162 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.13162 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.13162 in a Space README.md to link it from this page.

Collections including this paper 3

Лучший частный хостинг