The following papers were recommended by the Semantic Scholar API
\n- \n
- PixNerd: Pixel Neural Field Diffusion (2025) \n
- CineScale: Free Lunch in High-Resolution Cinematic Visual Generation (2025) \n
- GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation (2025) \n
- APT: Improving Diffusion Models for High Resolution Image Generation with Adaptive Path Tracing (2025) \n
- HiMat: DiT-based Ultra-High Resolution SVBRDF Generation (2025) \n
- Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis (2025) \n
- Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression (2025) \n
Please give a thumbs up to this comment if you found it helpful!
\nIf you want recommendations for any Paper on Hugging Face checkout this Space
\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
Very interesting paper. I do wonder if this method can be used for native low-resolution image generation too, such as pixel art. The lower end of the 'reliable exploration' is 256, but I'm wondering if sub 256 was unexplored due to an assumption that low res images aren't desirable.
True arbitrary resolution should also generalize on the extreme low end, right?
Thank you for your interest in our work. This paper focuses on generating high-resolution images, with our experiments primarily centered on resolutions of 256 or higher. Our specialized decoder is designed for this purpose; for lower resolutions (e.g., 128), the original VAE decoder is already a mature and effective solution, so our decoder isn't necessary. We are very appreciate your idea that the arbitary-scale should include low-resolution images. Our model also support the generation of low-resolution images, cause high-resolution outputs can always be downsampled to create excellent low-resolution versions.
\n","updatedAt":"2025-09-25T04:30:43.802Z","author":{"_id":"65f3f43fc9940817ca9a427b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f3f43fc9940817ca9a427b/02NN3XjSsbgWDhjrJWtVL.jpeg","fullname":"Wanghan Xu","name":"CoCoOne","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9411737322807312},"editors":["CoCoOne"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/65f3f43fc9940817ca9a427b/02NN3XjSsbgWDhjrJWtVL.jpeg"],"reactions":[{"reaction":"👍","users":["marksverdhei"],"count":1}],"isReport":false,"parentCommentId":"68d3aa8e51fa40ee58d86b48"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2509.10441","authors":[{"_id":"68c76c47ee0eed1697d6b662","name":"Tao Han","hidden":false},{"_id":"68c76c47ee0eed1697d6b663","user":{"_id":"65f3f43fc9940817ca9a427b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f3f43fc9940817ca9a427b/02NN3XjSsbgWDhjrJWtVL.jpeg","isPro":false,"fullname":"Wanghan Xu","user":"CoCoOne","type":"user"},"name":"Wanghan Xu","status":"claimed_verified","statusLastChangedAt":"2025-09-15T15:07:54.551Z","hidden":false},{"_id":"68c76c47ee0eed1697d6b664","name":"Junchao Gong","hidden":false},{"_id":"68c76c47ee0eed1697d6b665","user":{"_id":"6662a450b1fff5575fdf0fbd","avatarUrl":"/avatars/2a6065269f1980213625a9cfd8d42fbd.svg","isPro":false,"fullname":"Xiaoyu Yue","user":"YueXY233","type":"user"},"name":"Xiaoyu Yue","status":"claimed_verified","statusLastChangedAt":"2025-09-16T09:43:17.062Z","hidden":false},{"_id":"68c76c47ee0eed1697d6b666","name":"Song Guo","hidden":false},{"_id":"68c76c47ee0eed1697d6b667","name":"Luping Zhou","hidden":false},{"_id":"68c76c47ee0eed1697d6b668","name":"Lei Bai","hidden":false}],"publishedAt":"2025-09-12T17:48:57.000Z","submittedOnDailyAt":"2025-09-15T00:01:05.617Z","title":"InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"Arbitrary resolution image generation provides a consistent visual experience\nacross devices, having extensive applications for producers and consumers.\nCurrent diffusion models increase computational demand quadratically with\nresolution, causing 4K image generation delays over 100 seconds. To solve this,\nwe explore the second generation upon the latent diffusion models, where the\nfixed latent generated by diffusion models is regarded as the content\nrepresentation and we propose to decode arbitrary resolution images with a\ncompact generated latent using a one-step generator. Thus, we present the\nInfGen, replacing the VAE decoder with the new generator, for\ngenerating images at any resolution from a fixed-size latent without retraining\nthe diffusion models, which simplifies the process, reducing computational\ncomplexity and can be applied to any model using the same latent space.\nExperiments show InfGen is capable of improving many models into the arbitrary\nhigh-resolution era while cutting 4K image generation time to under 10 seconds.","upvotes":30,"discussionId":"68c76c48ee0eed1697d6b669","ai_summary":"InfGen, a one-step generator replacing the VAE decoder, enables arbitrary high-resolution image generation from a fixed-size latent, significantly reducing computational complexity and generation time.","ai_keywords":["diffusion models","latent diffusion models","VAE decoder","one-step generator","arbitrary resolution","computational complexity","image generation time"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"645e0a5ca11438270a15c63d","avatarUrl":"/avatars/26b795345c66398ebdb154aa357a020c.svg","isPro":false,"fullname":"quangdq","user":"kaidduong","type":"user"},{"_id":"6662a450b1fff5575fdf0fbd","avatarUrl":"/avatars/2a6065269f1980213625a9cfd8d42fbd.svg","isPro":false,"fullname":"Xiaoyu Yue","user":"YueXY233","type":"user"},{"_id":"65f3f43fc9940817ca9a427b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65f3f43fc9940817ca9a427b/02NN3XjSsbgWDhjrJWtVL.jpeg","isPro":false,"fullname":"Wanghan Xu","user":"CoCoOne","type":"user"},{"_id":"6684a72f74af0ef94892a3fa","avatarUrl":"/avatars/69c8bb5696f55a83aab627316a629ba8.svg","isPro":false,"fullname":"XUMING HE","user":"hexmSeeU","type":"user"},{"_id":"64130b8ede0e5470a370a8f0","avatarUrl":"/avatars/cb53d37928eea66c343dc53d06b62fd0.svg","isPro":false,"fullname":"WANG Jiong","user":"wjwow","type":"user"},{"_id":"639db46a6f45b49b2fae49b3","avatarUrl":"/avatars/7bfede9e51dd651c36923bbfb146a99c.svg","isPro":false,"fullname":"ZijieGuo","user":"WillGuo","type":"user"},{"_id":"65082baabc8788c4064d5360","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/NZpFVnTpPGcCe8mvFMD-L.jpeg","isPro":false,"fullname":"Xiangyuan Xue","user":"xxyQwQ","type":"user"},{"_id":"6427e08288215cee63b1c44d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6427e08288215cee63b1c44d/rzaG978FF-ywzicWNl_xl.jpeg","isPro":false,"fullname":"yao teng","user":"tytyt","type":"user"},{"_id":"61585e723db5b9d8243ba044","avatarUrl":"/avatars/8fd55d402ce3d2cbf2c2a451489f8542.svg","isPro":false,"fullname":"Ibrahim.H","user":"bitsnaps","type":"user"},{"_id":"638f308fc4444c6ca870b60a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/638f308fc4444c6ca870b60a/Q11NK-8-JbiilJ-vk2LAR.png","isPro":true,"fullname":"Linoy Tsaban","user":"linoyts","type":"user"},{"_id":"62447e04f555de1927a9c879","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648655841478-noauth.png","isPro":false,"fullname":"jasonjiang","user":"mikinyaa","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">InfGen: A Resolution-Agnostic Paradigm for Scalable Image Synthesis
Abstract
InfGen, a one-step generator replacing the VAE decoder, enables arbitrary high-resolution image generation from a fixed-size latent, significantly reducing computational complexity and generation time.
Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the InfGen, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.
Community
Arbitrary resolution image generation provides a consistent visual experience across devices, having extensive applications for producers and consumers. Current diffusion models increase computational demand quadratically with resolution, causing 4K image generation delays over 100 seconds. To solve this, we explore the second generation upon the latent diffusion models, where the fixed latent generated by diffusion models is regarded as the content representation and we propose to decode arbitrary resolution images with a compact generated latent using a one-step generator. Thus, we present the InfGen, replacing the VAE decoder with the new generator, for generating images at any resolution from a fixed-size latent without retraining the diffusion models, which simplifies the process, reducing computational complexity and can be applied to any model using the same latent space. Experiments show InfGen is capable of improving many models into the arbitrary high-resolution era while cutting 4K image generation time to under 10 seconds.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- PixNerd: Pixel Neural Field Diffusion (2025)
- CineScale: Free Lunch in High-Resolution Cinematic Visual Generation (2025)
- GPSToken: Gaussian Parameterized Spatially-adaptive Tokenization for Image Representation and Generation (2025)
- APT: Improving Diffusion Models for High Resolution Image Generation with Adaptive Path Tracing (2025)
- HiMat: DiT-based Ultra-High Resolution SVBRDF Generation (2025)
- Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis (2025)
- Steering One-Step Diffusion Model with Fidelity-Rich Decoder for Fast Image Compression (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Very interesting paper. I do wonder if this method can be used for native low-resolution image generation too, such as pixel art. The lower end of the 'reliable exploration' is 256, but I'm wondering if sub 256 was unexplored due to an assumption that low res images aren't desirable.
True arbitrary resolution should also generalize on the extreme low end, right?
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper