lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n","updatedAt":"2023-12-06T16:01:41.377Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6997045874595642},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[{"reaction":"👍","users":["ameerazam08"],"count":1}],"isReport":false}},{"id":"65fd746ff31aac18cc0b7f46","author":{"_id":"644bcc3befec70fb6897d0c8","avatarUrl":"/avatars/12fb5fee500ef3666b77ab55b4037ca7.svg","fullname":"Amit Saini","name":"Amit321","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2024-03-22T12:07:11.000Z","type":"comment","data":{"edited":true,"hidden":true,"hiddenBy":"","latest":{"raw":"This comment has been hidden","html":"This comment has been hidden","updatedAt":"2024-03-22T12:07:56.049Z","author":{"_id":"644bcc3befec70fb6897d0c8","avatarUrl":"/avatars/12fb5fee500ef3666b77ab55b4037ca7.svg","fullname":"Amit Saini","name":"Amit321","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"editors":[],"editorAvatarUrls":[],"reactions":[]}}],"primaryEmailConfirmed":false,"paper":{"id":"2312.02139","authors":[{"_id":"656e8f3fb075b63c909d629f","user":{"_id":"64414b62603214724ebd2636","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64414b62603214724ebd2636/x9JVcJRZKZE7hdEII1JRR.jpeg","isPro":false,"fullname":"Ali","user":"ahatamiz","type":"user"},"name":"Ali Hatamizadeh","status":"claimed_verified","statusLastChangedAt":"2023-12-06T10:25:38.486Z","hidden":false},{"_id":"656e8f3fb075b63c909d62a0","user":{"_id":"6312c2f4bbaa385279d2da1b","avatarUrl":"/avatars/2bf162889fa726ed18cc205b3f28609e.svg","isPro":false,"fullname":"Jiaming Song","user":"jiamings","type":"user"},"name":"Jiaming Song","status":"admin_assigned","statusLastChangedAt":"2023-12-05T11:21:26.226Z","hidden":false},{"_id":"656e8f3fb075b63c909d62a1","user":{"_id":"63bc4115141c7d395c4aa02c","avatarUrl":"/avatars/a9eca59e8c52cceab6524a4aaab11e38.svg","isPro":false,"fullname":"Liu","user":"liuguilin","type":"user"},"name":"Guilin Liu","status":"claimed_verified","statusLastChangedAt":"2025-04-22T10:18:28.863Z","hidden":false},{"_id":"656e8f3fb075b63c909d62a2","name":"Jan Kautz","hidden":false},{"_id":"656e8f3fb075b63c909d62a3","user":{"_id":"62f6956cd3bdacb7eec02920","avatarUrl":"/avatars/b22db0823311f866c00db2efc4b9f814.svg","isPro":false,"fullname":"Arash Vahdat","user":"avahdat","type":"user"},"name":"Arash Vahdat","status":"admin_assigned","statusLastChangedAt":"2023-12-05T11:22:08.057Z","hidden":false}],"publishedAt":"2023-12-04T18:57:01.000Z","submittedOnDailyAt":"2023-12-05T00:17:33.579Z","title":"DiffiT: Diffusion Vision Transformers for Image Generation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Diffusion models with their powerful expressivity and high sample quality\nhave enabled many new applications and use-cases in various domains. For sample\ngeneration, these models rely on a denoising neural network that generates\nimages by iterative denoising. Yet, the role of denoising network architecture\nis not well-studied with most efforts relying on convolutional residual U-Nets.\nIn this paper, we study the effectiveness of vision transformers in\ndiffusion-based generative learning. Specifically, we propose a new model,\ndenoted as Diffusion Vision Transformers (DiffiT), which consists of a hybrid\nhierarchical architecture with a U-shaped encoder and decoder. We introduce a\nnovel time-dependent self-attention module that allows attention layers to\nadapt their behavior at different stages of the denoising process in an\nefficient manner. We also introduce latent DiffiT which consists of transformer\nmodel with the proposed self-attention layers, for high-resolution image\ngeneration. Our results show that DiffiT is surprisingly effective in\ngenerating high-fidelity images, and it achieves state-of-the-art (SOTA)\nbenchmarks on a variety of class-conditional and unconditional synthesis tasks.\nIn the latent space, DiffiT achieves a new SOTA FID score of 1.73 on\nImageNet-256 dataset. Repository: https://github.com/NVlabs/DiffiT","upvotes":16,"discussionId":"656e8f45b075b63c909d644d","ai_summary":"A novel diffusion model using vision transformers with a hierarchical architecture and time-dependent self-attention achieves state-of-the-art performance in image generation.","ai_keywords":["diffusion models","denoising neural network","convolutional residual U-Nets","vision transformers","Diffusion Vision Transformers","DiffiT","U-shaped encoder","decoder","time-dependent self-attention","latent DiffiT","high-resolution image generation","class-conditional synthesis","unconditional synthesis","FID score"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"64414b62603214724ebd2636","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64414b62603214724ebd2636/x9JVcJRZKZE7hdEII1JRR.jpeg","isPro":false,"fullname":"Ali","user":"ahatamiz","type":"user"},{"_id":"63054f9320668afe24865bba","avatarUrl":"/avatars/75962ffed33d38761bce6c947750e1e4.svg","isPro":false,"fullname":"KW","user":"kevineen","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"63d4c8ce13ae45b780792f32","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675778487155-63d4c8ce13ae45b780792f32.jpeg","isPro":false,"fullname":"Ohenenoo","user":"PeepDaSlan9","type":"user"},{"_id":"6266513d539521e602b5dc3a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6266513d539521e602b5dc3a/7ZU_GyMBzrFHcHDoAkQlp.png","isPro":false,"fullname":"Ameer Azam","user":"ameerazam08","type":"user"},{"_id":"623c636949b6a399ee11152e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623c636949b6a399ee11152e/s_58Qr4gM-ZdLd1cegTQO.png","isPro":false,"fullname":"Gyanateet Dutta","user":"Ryukijano","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"62e54f0eae9d3f10acb95cb9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e54f0eae9d3f10acb95cb9/VAyk05hqB3OZWXEZW-B0q.png","isPro":true,"fullname":"mrfakename","user":"mrfakename","type":"user"},{"_id":"60c8d264224e250fb0178f77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60c8d264224e250fb0178f77/i8fbkBVcoFeJRmkQ9kYAE.png","isPro":true,"fullname":"Adam Lee","user":"Abecid","type":"user"},{"_id":"6549135c196ae037a74e10a3","avatarUrl":"/avatars/86194456844c7b2b5389de36cb258472.svg","isPro":false,"fullname":"Richrich","user":"RichardForests","type":"user"},{"_id":"650c8bfb3d3542884da1a845","avatarUrl":"/avatars/863a5deebf2ac6d4faedc4dd368e0561.svg","isPro":false,"fullname":"Adhurim ","user":"Limi07","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2312.02139

DiffiT: Diffusion Vision Transformers for Image Generation

Published on Dec 4, 2023
· Submitted by AK on Dec 5, 2023

Abstract

A novel diffusion model using vision transformers with a hierarchical architecture and time-dependent self-attention achieves state-of-the-art performance in image generation.

AI-generated summary

Diffusion models with their powerful expressivity and high sample quality have enabled many new applications and use-cases in various domains. For sample generation, these models rely on a denoising neural network that generates images by iterative denoising. Yet, the role of denoising network architecture is not well-studied with most efforts relying on convolutional residual U-Nets. In this paper, we study the effectiveness of vision transformers in diffusion-based generative learning. Specifically, we propose a new model, denoted as Diffusion Vision Transformers (DiffiT), which consists of a hybrid hierarchical architecture with a U-shaped encoder and decoder. We introduce a novel time-dependent self-attention module that allows attention layers to adapt their behavior at different stages of the denoising process in an efficient manner. We also introduce latent DiffiT which consists of transformer model with the proposed self-attention layers, for high-resolution image generation. Our results show that DiffiT is surprisingly effective in generating high-fidelity images, and it achieves state-of-the-art (SOTA) benchmarks on a variety of class-conditional and unconditional synthesis tasks. In the latent space, DiffiT achieves a new SOTA FID score of 1.73 on ImageNet-256 dataset. Repository: https://github.com/NVlabs/DiffiT

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

This comment has been hidden

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2312.02139 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2312.02139 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2312.02139 in a Space README.md to link it from this page.

Collections including this paper 7

Лучший частный хостинг