lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-08-08T01:35:53.475Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6705097556114197},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2507.23785","authors":[{"_id":"6892cf048da45ffb0a2b24b6","user":{"_id":"6237f25f16004228e6c74e01","avatarUrl":"/avatars/cc63ce464a25702c8155610d2a708595.svg","isPro":false,"fullname":"Bowen Zhang","user":"BwZhang","type":"user"},"name":"Bowen Zhang","status":"claimed_verified","statusLastChangedAt":"2025-08-06T19:16:22.835Z","hidden":false},{"_id":"6892cf048da45ffb0a2b24b7","name":"Sicheng Xu","hidden":false},{"_id":"6892cf048da45ffb0a2b24b8","name":"Chuxin Wang","hidden":false},{"_id":"6892cf048da45ffb0a2b24b9","name":"Jiaolong Yang","hidden":false},{"_id":"6892cf048da45ffb0a2b24ba","name":"Feng Zhao","hidden":false},{"_id":"6892cf048da45ffb0a2b24bb","name":"Dong Chen","hidden":false},{"_id":"6892cf048da45ffb0a2b24bc","name":"Baining Guo","hidden":false}],"publishedAt":"2025-07-31T17:59:51.000Z","submittedOnDailyAt":"2025-08-07T01:26:55.033Z","title":"Gaussian Variation Field Diffusion for High-fidelity Video-to-4D\n Synthesis","submittedOnDailyBy":{"_id":"6237f25f16004228e6c74e01","avatarUrl":"/avatars/cc63ce464a25702c8155610d2a708595.svg","isPro":false,"fullname":"Bowen Zhang","user":"BwZhang","type":"user"},"summary":"In this paper, we present a novel framework for video-to-4D generation that\ncreates high-quality dynamic 3D content from single video inputs. Direct 4D\ndiffusion modeling is extremely challenging due to costly data construction and\nthe high-dimensional nature of jointly representing 3D shape, appearance, and\nmotion. We address these challenges by introducing a Direct 4DMesh-to-GS\nVariation Field VAE that directly encodes canonical Gaussian Splats (GS) and\ntheir temporal variations from 3D animation data without per-instance fitting,\nand compresses high-dimensional animations into a compact latent space.\nBuilding upon this efficient representation, we train a Gaussian Variation\nField diffusion model with temporal-aware Diffusion Transformer conditioned on\ninput videos and canonical GS. Trained on carefully-curated animatable 3D\nobjects from the Objaverse dataset, our model demonstrates superior generation\nquality compared to existing methods. It also exhibits remarkable\ngeneralization to in-the-wild video inputs despite being trained exclusively on\nsynthetic data, paving the way for generating high-quality animated 3D content.\nProject page: https://gvfdiffusion.github.io/.","upvotes":18,"discussionId":"6892cf058da45ffb0a2b24bd","projectPage":"https://gvfdiffusion.github.io/","githubRepo":"https://github.com/ForeverFancy/gvfdiffusion","ai_summary":"A novel framework uses a Direct 4DMesh-to-GS Variation Field VAE and Gaussian Variation Field diffusion model to generate high-quality dynamic 3D content from single video inputs, demonstrating superior quality and generalization.","ai_keywords":["Direct 4D diffusion modeling","Gaussian Splats (GS)","Direct 4DMesh-to-GS Variation Field VAE","Gaussian Variation Field diffusion model","Diffusion Transformer","Objaverse dataset"],"githubStars":88},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6237f25f16004228e6c74e01","avatarUrl":"/avatars/cc63ce464a25702c8155610d2a708595.svg","isPro":false,"fullname":"Bowen Zhang","user":"BwZhang","type":"user"},{"_id":"6608c0ab14fdf5e925b8319d","avatarUrl":"/avatars/8054cbc6827a67fa7f488ecbf2cfccf3.svg","isPro":false,"fullname":"lou","user":"feidu","type":"user"},{"_id":"662f84aca1e6ae11c27225d5","avatarUrl":"/avatars/423efcb7da498a9229149251910e8fc9.svg","isPro":false,"fullname":"Shuxian Bi","user":"shuxianbi","type":"user"},{"_id":"682476813deefa2cac97eb4c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/i7LZIMhhx8pmDIAuY7yk5.png","isPro":false,"fullname":"Chuxin Wang","user":"Chuxwa","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6787a306fe076b7cbc501815","avatarUrl":"/avatars/8dfe7c61511c1ce78d5b2fe119e8562b.svg","isPro":false,"fullname":"js tang","user":"tjs9707","type":"user"},{"_id":"64f7f119a92703ef65d9a717","avatarUrl":"/avatars/118524faab66cecba6d4da622034b44b.svg","isPro":false,"fullname":"Sirui Zhang","user":"zsr200901","type":"user"},{"_id":"6407e5294edf9f5c4fd32228","avatarUrl":"/avatars/8e2d55460e9fe9c426eb552baf4b2cb0.svg","isPro":false,"fullname":"Stoney Kang","user":"sikang99","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"64ea59beb36ed038b6638ece","avatarUrl":"/avatars/a74b3c8b63b5ca8ebb3a00455f6f803f.svg","isPro":false,"fullname":"Slava","user":"wertlon","type":"user"},{"_id":"635964636a61954080850e1d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635964636a61954080850e1d/0bfExuDTrHTtm8c-40cDM.png","isPro":false,"fullname":"William Lamkin","user":"phanes","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2507.23785

Gaussian Variation Field Diffusion for High-fidelity Video-to-4D Synthesis

Published on Jul 31
· Submitted by Bowen Zhang on Aug 7
Authors:
,
,
,
,
,

Abstract

A novel framework uses a Direct 4DMesh-to-GS Variation Field VAE and Gaussian Variation Field diffusion model to generate high-quality dynamic 3D content from single video inputs, demonstrating superior quality and generalization.

AI-generated summary

In this paper, we present a novel framework for video-to-4D generation that creates high-quality dynamic 3D content from single video inputs. Direct 4D diffusion modeling is extremely challenging due to costly data construction and the high-dimensional nature of jointly representing 3D shape, appearance, and motion. We address these challenges by introducing a Direct 4DMesh-to-GS Variation Field VAE that directly encodes canonical Gaussian Splats (GS) and their temporal variations from 3D animation data without per-instance fitting, and compresses high-dimensional animations into a compact latent space. Building upon this efficient representation, we train a Gaussian Variation Field diffusion model with temporal-aware Diffusion Transformer conditioned on input videos and canonical GS. Trained on carefully-curated animatable 3D objects from the Objaverse dataset, our model demonstrates superior generation quality compared to existing methods. It also exhibits remarkable generalization to in-the-wild video inputs despite being trained exclusively on synthetic data, paving the way for generating high-quality animated 3D content. Project page: https://gvfdiffusion.github.io/.

Community

Paper author Paper submitter

In this paper, we introduce a novel framework to address the challenging task of 4D generative modeling. To efficiently construct the large-scale training dataset and reduce the modeling difficulty for diffusion, we first introduce a Direct 4DMesh-to-GS Variation Field VAE, which is able to efficiently compress complex motion information into a compact latent space without requiring costly per-instance fitting. Then, a Gaussian Variation Field diffusion model that generates high-quality dynamic variation fields conditioned on input videos and canonical 3DGS. By decomposing 4D generation into canonical 3DGS generation and Gaussian Variation Field modeling, our method significantly reduces computational complexity while maintaining high fidelity. Quantitative and qualitative evaluations demonstrate that our approach consistently outperforms existing methods. Furthermore, our model exhibits remarkable generalization capability with in-the-wild video inputs, advancing the state of high-quality animated 3D content generation.

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2507.23785 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.23785 in a Space README.md to link it from this page.

Collections including this paper 4

Лучший частный хостинг