lynx   »   [go: up one dir, main page]

\"Screenshot

\n","updatedAt":"2025-09-24T10:33:16.696Z","author":{"_id":"687e7d00b5fd6dd2c54f6210","avatarUrl":"/avatars/f1922b3fc9da1fed0a46319708e4a525.svg","fullname":"datalearn","name":"datalearn","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.31901881098747253},"editors":["datalearn"],"editorAvatarUrls":["/avatars/f1922b3fc9da1fed0a46319708e4a525.svg"],"reactions":[],"isReport":false}},{"id":"68d49c9d86c1d7aef66f832d","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-09-25T01:36:29.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [IDCNet: Guided Video Diffusion for Metric-Consistent RGBD Scene Generation with Precise Camera Control](https://huggingface.co/papers/2508.04147) (2025)\n* [GSFixer: Improving 3D Gaussian Splatting with Reference-Guided Video Diffusion Priors](https://huggingface.co/papers/2508.09667) (2025)\n* [4DNeX: Feed-Forward 4D Generative Modeling Made Easy](https://huggingface.co/papers/2508.13154) (2025)\n* [Dream-to-Recon: Monocular 3D Reconstruction with Diffusion-Depth Distillation from Single Images](https://huggingface.co/papers/2508.02323) (2025)\n* [Complete Gaussian Splats from a Single Image with Denoising Diffusion Models](https://huggingface.co/papers/2508.21542) (2025)\n* [Matrix-3D: Omnidirectional Explorable 3D World Generation](https://huggingface.co/papers/2508.08086) (2025)\n* [LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding](https://huggingface.co/papers/2508.19204) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-09-25T01:36:29.911Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6671194434165955},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2509.19296","authors":[{"_id":"68d34f1a0e215259d193b19c","user":{"_id":"647ef66c45baf21ad707b291","avatarUrl":"/avatars/4cd941cdca6dd829fdc9cb3fb788a99c.svg","isPro":false,"fullname":"Sherwin Bahmani","user":"sherwinbahmani","type":"user"},"name":"Sherwin Bahmani","status":"admin_assigned","statusLastChangedAt":"2025-09-24T15:27:57.343Z","hidden":false},{"_id":"68d34f1a0e215259d193b19d","name":"Tianchang Shen","hidden":false},{"_id":"68d34f1a0e215259d193b19e","name":"Jiawei Ren","hidden":false},{"_id":"68d34f1a0e215259d193b19f","name":"Jiahui Huang","hidden":false},{"_id":"68d34f1a0e215259d193b1a0","name":"Yifeng Jiang","hidden":false},{"_id":"68d34f1a0e215259d193b1a1","user":{"_id":"656e000253703dd78fd072a9","avatarUrl":"/avatars/6702ba8fabe3d08884aa757f90cea333.svg","isPro":false,"fullname":"Haithem Turki","user":"hturki","type":"user"},"name":"Haithem Turki","status":"admin_assigned","statusLastChangedAt":"2025-09-24T15:29:09.869Z","hidden":false},{"_id":"68d34f1a0e215259d193b1a2","name":"Andrea Tagliasacchi","hidden":false},{"_id":"68d34f1a0e215259d193b1a3","name":"David B. Lindell","hidden":false},{"_id":"68d34f1a0e215259d193b1a4","user":{"_id":"6366cda3361a96184dc22139","avatarUrl":"/avatars/d8a88c84cb5f69e69dd038674a29be89.svg","isPro":false,"fullname":"Zan Gojcic","user":"zgojcic","type":"user"},"name":"Zan Gojcic","status":"admin_assigned","statusLastChangedAt":"2025-09-24T15:28:51.842Z","hidden":false},{"_id":"68d34f1a0e215259d193b1a5","name":"Sanja Fidler","hidden":false},{"_id":"68d34f1a0e215259d193b1a6","name":"Huan Ling","hidden":false},{"_id":"68d34f1a0e215259d193b1a7","name":"Jun Gao","hidden":false},{"_id":"68d34f1a0e215259d193b1a8","user":{"_id":"658529d61c461dfe88afe8e8","avatarUrl":"/avatars/a22c1b07d28c2662833c462c6537d835.svg","isPro":false,"fullname":"Xuanchi Ren","user":"xrenaa","type":"user"},"name":"Xuanchi Ren","status":"claimed_verified","statusLastChangedAt":"2025-09-25T07:23:07.715Z","hidden":false}],"publishedAt":"2025-09-23T17:58:01.000Z","submittedOnDailyAt":"2025-09-24T00:23:39.732Z","title":"Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model\n Self-Distillation","submittedOnDailyBy":{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},"summary":"The ability to generate virtual environments is crucial for applications\nranging from gaming to physical AI domains such as robotics, autonomous\ndriving, and industrial AI. Current learning-based 3D reconstruction methods\nrely on the availability of captured real-world multi-view data, which is not\nalways readily available. Recent advancements in video diffusion models have\nshown remarkable imagination capabilities, yet their 2D nature limits the\napplications to simulation where a robot needs to navigate and interact with\nthe environment. In this paper, we propose a self-distillation framework that\naims to distill the implicit 3D knowledge in the video diffusion models into an\nexplicit 3D Gaussian Splatting (3DGS) representation, eliminating the need for\nmulti-view training data. Specifically, we augment the typical RGB decoder with\na 3DGS decoder, which is supervised by the output of the RGB decoder. In this\napproach, the 3DGS decoder can be purely trained with synthetic data generated\nby video diffusion models. At inference time, our model can synthesize 3D\nscenes from either a text prompt or a single image for real-time rendering. Our\nframework further extends to dynamic 3D scene generation from a monocular input\nvideo. Experimental results show that our framework achieves state-of-the-art\nperformance in static and dynamic 3D scene generation.","upvotes":19,"discussionId":"68d34f1b0e215259d193b1a9","projectPage":"https://research.nvidia.com/labs/toronto-ai/lyra/","githubRepo":"https://github.com/nv-tlabs/lyra","ai_summary":"A self-distillation framework converts implicit 3D knowledge from video diffusion models into an explicit 3D Gaussian Splatting representation, enabling 3D scene generation from text or images.","ai_keywords":["video diffusion models","3D Gaussian Splatting","3DGS","RGB decoder","3D scene generation","text prompt","monocular input video","dynamic 3D scene generation"],"githubStars":312},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6687b5241c9e513b687f02f0","avatarUrl":"/avatars/03f688bbc6f7d4d4fee3058c279494d2.svg","isPro":false,"fullname":"Bo Yang","user":"Rico221","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"64c1c77c245c55a21c6f5a13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64c1c77c245c55a21c6f5a13/d9zlSksf3TxWpBbb-r0fd.jpeg","isPro":false,"fullname":"Reza Sayar","user":"Reza2kn","type":"user"},{"_id":"645db15ff4f49de580a10269","avatarUrl":"/avatars/ea1bdd7a478f4c4a7b3e134c4330ec78.svg","isPro":false,"fullname":"snowflakewang","user":"SnowflakeWang","type":"user"},{"_id":"67fcb7b587b7b9a1e00b9541","avatarUrl":"/avatars/a0f28d76e3282af038bda40306df4a41.svg","isPro":false,"fullname":"Tang","user":"jjktghh02","type":"user"},{"_id":"652bd2493a416e1f21beb01a","avatarUrl":"/avatars/aaeb8ef49d36903ede37ea7177a75e58.svg","isPro":false,"fullname":"Shaocong.Xu","user":"Daniellesry","type":"user"},{"_id":"6407e5294edf9f5c4fd32228","avatarUrl":"/avatars/8e2d55460e9fe9c426eb552baf4b2cb0.svg","isPro":false,"fullname":"Stoney Kang","user":"sikang99","type":"user"},{"_id":"6683fc5344a65be1aab25dc0","avatarUrl":"/avatars/e13cde3f87b59e418838d702807df3b5.svg","isPro":false,"fullname":"hjkim","user":"hojie11","type":"user"},{"_id":"64ea59beb36ed038b6638ece","avatarUrl":"/avatars/a74b3c8b63b5ca8ebb3a00455f6f803f.svg","isPro":false,"fullname":"Slava","user":"wertlon","type":"user"},{"_id":"658529d61c461dfe88afe8e8","avatarUrl":"/avatars/a22c1b07d28c2662833c462c6537d835.svg","isPro":false,"fullname":"Xuanchi Ren","user":"xrenaa","type":"user"},{"_id":"60bc063208048a33ffdb1a6d","avatarUrl":"/avatars/33aabcf608d06426c259f3fcc57115dc.svg","isPro":true,"fullname":"Mwangi","user":"Benson","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2509.19296

Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation

Published on Sep 23
· Submitted by taesiri on Sep 24
Authors:
,
,
,
,
,
,
,
,
,

Abstract

A self-distillation framework converts implicit 3D knowledge from video diffusion models into an explicit 3D Gaussian Splatting representation, enabling 3D scene generation from text or images.

AI-generated summary

The ability to generate virtual environments is crucial for applications ranging from gaming to physical AI domains such as robotics, autonomous driving, and industrial AI. Current learning-based 3D reconstruction methods rely on the availability of captured real-world multi-view data, which is not always readily available. Recent advancements in video diffusion models have shown remarkable imagination capabilities, yet their 2D nature limits the applications to simulation where a robot needs to navigate and interact with the environment. In this paper, we propose a self-distillation framework that aims to distill the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation, eliminating the need for multi-view training data. Specifically, we augment the typical RGB decoder with a 3DGS decoder, which is supervised by the output of the RGB decoder. In this approach, the 3DGS decoder can be purely trained with synthetic data generated by video diffusion models. At inference time, our model can synthesize 3D scenes from either a text prompt or a single image for real-time rendering. Our framework further extends to dynamic 3D scene generation from a monocular input video. Experimental results show that our framework achieves state-of-the-art performance in static and dynamic 3D scene generation.

Community

Paper submitter

The ability to generate virtual environments is crucial for applications ranging from gaming to physical AI domains such as robotics, autonomous driving, and industrial AI. Current learning-based 3D reconstruction methods rely on the availability of captured real-world multi-view data, which is not always readily available. Recent advancements in video diffusion models have shown remarkable imagination capabilities, yet their 2D nature limits the applications to simulation where a robot needs to navigate and interact with the environment. In this paper, we propose a self-distillation framework that aims to distill the implicit 3D knowledge in the video diffusion models into an explicit 3D Gaussian Splatting (3DGS) representation, eliminating the need for multi-view training data. Specifically, we augment the typical RGB decoder with a 3DGS decoder, which is supervised by the output of the RGB decoder. In this approach, the 3DGS decoder can be purely trained with synthetic data generated by video diffusion models. At inference time, our model can synthesize 3D scenes from either a text prompt or a single image for real-time rendering. Our framework further extends to dynamic 3D scene generation from a monocular input video. Experimental results show that our framework achieves state-of-the-art performance in static and dynamic 3D scene generation.

Dude this is insane, really great job by the team.

Great research should inspire average people. I don't have an advanced degree and this was very readable, even pleasent.

The multi trajectory part is especially interesting, there are so many applications beyond video.

Screenshot 2025-08-29 171051.png

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2509.19296 in a Space README.md to link it from this page.

Collections including this paper 2

Лучший частный хостинг