Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

MagicVideo-V2: Multi-Stage High-Aesthetic Video Generation (2024)
CreativeSynth: Creative Blending and Synthesis of Visual Arts based on Multimodal Diffusion (2024)
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis (2024)
UniVG: Towards UNIfied-modal Video Generation (2024)
Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild (2024)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-03-01T01:26:55.769Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7305008172988892},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.17245","authors":[{"_id":"65deb1fa06a8c85c5e9bb72e","user":{"_id":"633f44efc11d723b1809958b","avatarUrl":"/avatars/d70b067deabddae984c3637290b7de99.svg","isPro":false,"fullname":"Li","user":"Daiqing","type":"user"},"name":"Daiqing Li","status":"admin_assigned","statusLastChangedAt":"2024-02-28T12:42:00.530Z","hidden":false},{"_id":"65deb1fa06a8c85c5e9bb72f","user":{"_id":"63855d851769b7c4b10e1f76","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63855d851769b7c4b10e1f76/WnYyg4YivoUmK4QNic0QX.png","isPro":false,"fullname":"Aleks Kamko","user":"aykamko","type":"user"},"name":"Aleks Kamko","status":"admin_assigned","statusLastChangedAt":"2024-02-28T12:42:11.828Z","hidden":false},{"_id":"65deb1fa06a8c85c5e9bb730","user":{"_id":"636c0c4eaae2da3c76b8a9a3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1675208663648-636c0c4eaae2da3c76b8a9a3.png","isPro":false,"fullname":"Ehsan Akhgari","user":"ehsanakh","type":"user"},"name":"Ehsan Akhgari","status":"admin_assigned","statusLastChangedAt":"2024-02-28T12:42:18.669Z","hidden":false},{"_id":"65deb1fa06a8c85c5e9bb731","user":{"_id":"642dceb0355bb9a2a43954a4","avatarUrl":"/avatars/0154f1c8fb36a8642e9b0e05063ab84e.svg","isPro":false,"fullname":"Ali Sabet","user":"asabet-pai","type":"user"},"name":"Ali Sabet","status":"admin_assigned","statusLastChangedAt":"2024-02-28T12:42:59.815Z","hidden":false},{"_id":"65deb1fa06a8c85c5e9bb732","user":{"_id":"63653383a7a1324ccd56b370","avatarUrl":"/avatars/d623a5bf1ee8fb8ebaeef26b661101ff.svg","isPro":false,"fullname":"Linmiao Xu","user":"linrock","type":"user"},"name":"Linmiao Xu","status":"admin_assigned","statusLastChangedAt":"2024-02-28T12:43:06.121Z","hidden":false},{"_id":"65deb1fa06a8c85c5e9bb733","name":"Suhail Doshi","hidden":false}],"publishedAt":"2024-02-27T06:31:52.000Z","submittedOnDailyAt":"2024-02-28T01:39:35.091Z","title":"Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in\n Text-to-Image Generation","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"In this work, we share three insights for achieving state-of-the-art\naesthetic quality in text-to-image generative models. We focus on three\ncritical aspects for model improvement: enhancing color and contrast, improving\ngeneration across multiple aspect ratios, and improving human-centric fine\ndetails. First, we delve into the significance of the noise schedule in\ntraining a diffusion model, demonstrating its profound impact on realism and\nvisual fidelity. Second, we address the challenge of accommodating various\naspect ratios in image generation, emphasizing the importance of preparing a\nbalanced bucketed dataset. Lastly, we investigate the crucial role of aligning\nmodel outputs with human preferences, ensuring that generated images resonate\nwith human perceptual expectations. Through extensive analysis and experiments,\nPlayground v2.5 demonstrates state-of-the-art performance in terms of aesthetic\nquality under various conditions and aspect ratios, outperforming both\nwidely-used open-source models like SDXL and Playground v2, and closed-source\ncommercial systems such as DALLE 3 and Midjourney v5.2. Our model is\nopen-source, and we hope the development of Playground v2.5 provides valuable\nguidelines for researchers aiming to elevate the aesthetic quality of\ndiffusion-based image generation models.","upvotes":12,"discussionId":"65deb1ff06a8c85c5e9bb9c7","ai_summary":"Enhancements to noise schedules, aspect ratio accommodation, and alignment with human perceptual preferences improve the aesthetic quality of text-to-image generative models, leading to state-of-the-art performance.","ai_keywords":["diffusion model","noise schedule","realism","visual fidelity","balanced bucketed dataset","human preferences","aesthetic quality","state-of-the-art performance","SDXL","Midjourney v5.2"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"651707586ae752c060249656","avatarUrl":"/avatars/d34a48897e202e7b7ff2aaf075f55faf.svg","isPro":false,"fullname":"Aditya yadav","user":"adityayadav","type":"user"},{"_id":"6101c620900eaa0057c2ce1d","avatarUrl":"/avatars/bd282166c120711c65b5409dc860ac58.svg","isPro":false,"fullname":"Abdel-Dayane Marcos","user":"admarcosai","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"63c5d43ae2804cb2407e4d43","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1673909278097-noauth.png","isPro":false,"fullname":"xziayro","user":"xziayro","type":"user"},{"_id":"61868ce808aae0b5499a2a95","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/61868ce808aae0b5499a2a95/F6BA0anbsoY_Z7M1JrwOe.jpeg","isPro":true,"fullname":"Sylvain Filoni","user":"fffiloni","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"62cd6cb2a646b7948dbc7602","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62cd6cb2a646b7948dbc7602/ZloVoCZXkTTyq7D4fKjCi.jpeg","isPro":true,"fullname":"Paul Asselin","user":"asselinpaul","type":"user"},{"_id":"62c627c4644269e788cfee34","avatarUrl":"/avatars/cf897809ef87e7d81bd537d45ed10e84.svg","isPro":false,"fullname":"Suhail","user":"suhaild","type":"user"},{"_id":"633b71b47af633cbcd0671d8","avatarUrl":"/avatars/6671941ced18ae516db6ebfbf73e239f.svg","isPro":false,"fullname":"juand4bot","user":"juandavidgf","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"},{"_id":"67c15ae39e08e7fed271032b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/67c15ae39e08e7fed271032b/_ko9h5o5nZNpBCuX_q2Pc.jpeg","isPro":false,"fullname":"LiuZhiHao","user":"ZhiHao9806","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">

Papers

arxiv:2402.17245

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Published on Feb 27, 2024

· Submitted by

AK on Feb 28, 2024

Upvote

Authors:

Daiqing Li ,

Aleks Kamko ,

Ehsan Akhgari ,

Ali Sabet ,

Linmiao Xu ,

Abstract

Enhancements to noise schedules, aspect ratio accommodation, and alignment with human perceptual preferences improve the aesthetic quality of text-to-image generative models, leading to state-of-the-art performance.

AI-generated summary

In this work, we share three insights for achieving state-of-the-art aesthetic quality in text-to-image generative models. We focus on three critical aspects for model improvement: enhancing color and contrast, improving generation across multiple aspect ratios, and improving human-centric fine details. First, we delve into the significance of the noise schedule in training a diffusion model, demonstrating its profound impact on realism and visual fidelity. Second, we address the challenge of accommodating various aspect ratios in image generation, emphasizing the importance of preparing a balanced bucketed dataset. Lastly, we investigate the crucial role of aligning model outputs with human preferences, ensuring that generated images resonate with human perceptual expectations. Through extensive analysis and experiments, Playground v2.5 demonstrates state-of-the-art performance in terms of aesthetic quality under various conditions and aspect ratios, outperforming both widely-used open-source models like SDXL and Playground v2, and closed-source commercial systems such as DALLE 3 and Midjourney v5.2. Our model is open-source, and we hope the development of Playground v2.5 provides valuable guidelines for researchers aiming to elevate the aesthetic quality of diffusion-based image generation models.

View arXiv page View PDF Add to collection

Community

librarian-bot

Mar 1, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Playground v2.5: Three Insights towards Enhancing Aesthetic Quality in Text-to-Image Generation

Abstract

Community

Models citing this paper 2

Datasets citing this paper 2

Spaces citing this paper 76

Collections including this paper 6