Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

IDCNet: Guided Video Diffusion for Metric-Consistent RGBD Scene Generation with Precise Camera Control (2025)
Matrix-3D: Omnidirectional Explorable 3D World Generation (2025)
AnimaX: Animating the Inanimate in 3D with Joint Video-Pose Diffusion Models (2025)
GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors (2025)
Dream4D: Lifting Camera-Controlled I2V towards Spatiotemporally Consistent 4D Generation (2025)
MoVieS: Motion-Aware 4D Dynamic View Synthesis in One Second (2025)
SeqTex: Generate Mesh Textures in Video Sequence (2025)

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-08-20T01:36:01.376Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6749278903007507},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2508.13154","authors":[{"_id":"68a3dfefb65388761d074471","name":"Zhaoxi Chen","hidden":false},{"_id":"68a3dfefb65388761d074472","name":"Tianqi Liu","hidden":false},{"_id":"68a3dfefb65388761d074473","name":"Long Zhuo","hidden":false},{"_id":"68a3dfefb65388761d074474","name":"Jiawei Ren","hidden":false},{"_id":"68a3dfefb65388761d074475","name":"Zeng Tao","hidden":false},{"_id":"68a3dfefb65388761d074476","name":"He Zhu","hidden":false},{"_id":"68a3dfefb65388761d074477","name":"Fangzhou Hong","hidden":false},{"_id":"68a3dfefb65388761d074478","name":"Liang Pan","hidden":false},{"_id":"68a3dfefb65388761d074479","name":"Ziwei Liu","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66d347eebb76fb26eedb256e/ll8ni6nZZorhKA7jCZgT_.mp4"],"publishedAt":"2025-08-18T17:59:55.000Z","submittedOnDailyAt":"2025-08-19T00:55:49.901Z","title":"4DNeX: Feed-Forward 4D Generative Modeling Made Easy","submittedOnDailyBy":{"_id":"66d347eebb76fb26eedb256e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66d347eebb76fb26eedb256e/iCPF7GkmZu--XCsWzoucl.jpeg","isPro":false,"fullname":"tianqi liu","user":"tqliu","type":"user"},"summary":"We present 4DNeX, the first feed-forward framework for generating 4D (i.e.,\ndynamic 3D) scene representations from a single image. In contrast to existing\nmethods that rely on computationally intensive optimization or require\nmulti-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D\ngeneration by fine-tuning a pretrained video diffusion model. Specifically, 1)\nto alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scale\ndataset with high-quality 4D annotations generated using advanced\nreconstruction approaches. 2) we introduce a unified 6D video representation\nthat jointly models RGB and XYZ sequences, facilitating structured learning of\nboth appearance and geometry. 3) we propose a set of simple yet effective\nadaptation strategies to repurpose pretrained video diffusion models for 4D\nmodeling. 4DNeX produces high-quality dynamic point clouds that enable\nnovel-view video synthesis. Extensive experiments demonstrate that 4DNeX\noutperforms existing 4D generation methods in efficiency and generalizability,\noffering a scalable solution for image-to-4D modeling and laying the foundation\nfor generative 4D world models that simulate dynamic scene evolution.","upvotes":60,"discussionId":"68a3dfefb65388761d07447a","projectPage":"https://4dnex.github.io/","githubRepo":"https://github.com/3DTopia/4DNeX","ai_summary":"4DNeX generates high-quality dynamic 3D scene representations from a single image using a fine-tuned pretrained video diffusion model, outperforming existing methods in efficiency and generalizability.","ai_keywords":["feed-forward framework","4D scene representations","video diffusion model","4DNeX-10M","6D video representation","RGB","XYZ sequences","dynamic point clouds","novel-view video synthesis","generative 4D world models"],"githubStars":656},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66d347eebb76fb26eedb256e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66d347eebb76fb26eedb256e/iCPF7GkmZu--XCsWzoucl.jpeg","isPro":false,"fullname":"tianqi liu","user":"tqliu","type":"user"},{"_id":"62fc8cf7ee999004b5a8b982","avatarUrl":"/avatars/6c5dda9e58747054a989f077a078f3dc.svg","isPro":false,"fullname":"Zhaoxi Chen","user":"FrozenBurning","type":"user"},{"_id":"64ef30fb0de94d31b7921175","avatarUrl":"/avatars/d4f4c7be460befc1cedec13bbf9db972.svg","isPro":false,"fullname":"pan","user":"pldeqiushui","type":"user"},{"_id":"62ab1ac1d48b4d8b048a3473","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1656826685333-62ab1ac1d48b4d8b048a3473.png","isPro":false,"fullname":"Ziwei Liu","user":"liuziwei7","type":"user"},{"_id":"636e931444a18bc3c013ddc4","avatarUrl":"/avatars/7f704a38558c572b23986857159f6e58.svg","isPro":false,"fullname":"Hengwei","user":"hwbian","type":"user"},{"_id":"67ea5e54b2ee89a77a2fa1bf","avatarUrl":"/avatars/ce8c98ccd105b25fef86b739c3ee38e4.svg","isPro":false,"fullname":"lebron","user":"lebron664","type":"user"},{"_id":"67ea60172d79149e89a667e2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/lXUtPVxbfdAyyFBETZep7.png","isPro":false,"fullname":"635A","user":"635A","type":"user"},{"_id":"67dd0b83e297b83ef664f62e","avatarUrl":"/avatars/3e36b02b111a9988f4c2ba5f32ea0613.svg","isPro":false,"fullname":"Zixian Liu","user":"StoreBlank","type":"user"},{"_id":"6342796a0875f2c99cfd313b","avatarUrl":"/avatars/98575092404c4197b20c929a6499a015.svg","isPro":false,"fullname":"Yuseung \"Phillip\" Lee","user":"phillipinseoul","type":"user"},{"_id":"64b4a717aa03b6520839e9b8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b4a717aa03b6520839e9b8/Rt3ERG-6BVEA4hAwOz0_I.jpeg","isPro":false,"fullname":"Haiwen Diao","user":"Paranioar","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"66719ffcc28acacf36819244","avatarUrl":"/avatars/e28e9c32509200ae0a4ad5388469027e.svg","isPro":false,"fullname":"Yang Gao","user":"LazyBusyYang","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">

Papers

arxiv:2508.13154

4DNeX: Feed-Forward 4D Generative Modeling Made Easy

Published on Aug 18

· Submitted by

tianqi liu on Aug 19

#3 Paper of the day

Upvote

Authors:

Abstract

4DNeX generates high-quality dynamic 3D scene representations from a single image using a fine-tuned pretrained video diffusion model, outperforming existing methods in efficiency and generalizability.

AI-generated summary

We present 4DNeX, the first feed-forward framework for generating 4D (i.e., dynamic 3D) scene representations from a single image. In contrast to existing methods that rely on computationally intensive optimization or require multi-frame video inputs, 4DNeX enables efficient, end-to-end image-to-4D generation by fine-tuning a pretrained video diffusion model. Specifically, 1) to alleviate the scarcity of 4D data, we construct 4DNeX-10M, a large-scale dataset with high-quality 4D annotations generated using advanced reconstruction approaches. 2) we introduce a unified 6D video representation that jointly models RGB and XYZ sequences, facilitating structured learning of both appearance and geometry. 3) we propose a set of simple yet effective adaptation strategies to repurpose pretrained video diffusion models for 4D modeling. 4DNeX produces high-quality dynamic point clouds that enable novel-view video synthesis. Extensive experiments demonstrate that 4DNeX outperforms existing 4D generation methods in efficiency and generalizability, offering a scalable solution for image-to-4D modeling and laying the foundation for generative 4D world models that simulate dynamic scene evolution.