lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-02-08T01:22:06.850Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7077921032905579},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2402.03570","authors":[{"_id":"65c3062182d1feaf1b21a643","user":{"_id":"627be1464d0858f00350ebba","avatarUrl":"/avatars/d2ecacf46f416f46a0df29e9153b3f11.svg","isPro":false,"fullname":"Zihan Ding","user":"quantumiracle","type":"user"},"name":"Zihan Ding","status":"extracted_pending","statusLastChangedAt":"2024-02-07T04:25:07.178Z","hidden":false},{"_id":"65c3062182d1feaf1b21a644","name":"Amy Zhang","hidden":false},{"_id":"65c3062182d1feaf1b21a645","name":"Yuandong Tian","hidden":false},{"_id":"65c3062182d1feaf1b21a646","user":{"_id":"64d27579dafee18faf9308ac","avatarUrl":"/avatars/8914a47244017c3541d3d5ac5b2d0372.svg","isPro":false,"fullname":"Qinqing Zheng","user":"goodsleep","type":"user"},"name":"Qinqing Zheng","status":"claimed_verified","statusLastChangedAt":"2025-02-07T09:59:36.169Z","hidden":false}],"publishedAt":"2024-02-05T22:43:57.000Z","submittedOnDailyAt":"2024-02-07T01:55:07.196Z","title":"Diffusion World Model","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We introduce Diffusion World Model (DWM), a conditional diffusion model\ncapable of predicting multistep future states and rewards concurrently. As\nopposed to traditional one-step dynamics models, DWM offers long-horizon\npredictions in a single forward pass, eliminating the need for recursive\nquires. We integrate DWM into model-based value estimation, where the\nshort-term return is simulated by future trajectories sampled from DWM. In the\ncontext of offline reinforcement learning, DWM can be viewed as a conservative\nvalue regularization through generative modeling. Alternatively, it can be seen\nas a data source that enables offline Q-learning with synthetic data. Our\nexperiments on the D4RL dataset confirm the robustness of DWM to long-horizon\nsimulation. In terms of absolute performance, DWM significantly surpasses\none-step dynamics models with a 44% performance gain, and achieves\nstate-of-the-art performance.","upvotes":8,"discussionId":"65c3062382d1feaf1b21a6b2","ai_summary":"A conditional diffusion model (DWM) is introduced for long-horizon predictions in model-based reinforcement learning, surpassing traditional one-step models and achieving state-of-the-art performance.","ai_keywords":["Diffusion World Model","conditional diffusion model","long-horizon predictions","single forward pass","model-based value estimation","offline reinforcement learning","generative modeling","synthetic data","D4RL dataset","one-step dynamics models"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"630f7b8202ce39336c440463","avatarUrl":"/avatars/17069702489d53992a66b6c1816297c0.svg","isPro":false,"fullname":"Oscar","user":"ofvicente","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"60c8d264224e250fb0178f77","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60c8d264224e250fb0178f77/i8fbkBVcoFeJRmkQ9kYAE.png","isPro":true,"fullname":"Adam Lee","user":"Abecid","type":"user"},{"_id":"6311bca0ae8896941da24e66","avatarUrl":"/avatars/48de64894fc3c9397e26e4d6da3ff537.svg","isPro":false,"fullname":"Fynn Kröger","user":"fynnkroeger","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"635f16eda81c7f7424a58996","avatarUrl":"/avatars/e25928188c3c9b7ac3d1abd69bcc39d5.svg","isPro":false,"fullname":"I Am Imagen","user":"imagen","type":"user"},{"_id":"663ccbff3a74a20189d4aa2e","avatarUrl":"/avatars/83a54455e0157480f65c498cd9057cf2.svg","isPro":false,"fullname":"Nguyen Van Thanh","user":"NguyenVanThanhHust","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2402.03570

Diffusion World Model

Published on Feb 5, 2024
· Submitted by AK on Feb 7, 2024
Authors:
,
,

Abstract

A conditional diffusion model (DWM) is introduced for long-horizon predictions in model-based reinforcement learning, surpassing traditional one-step models and achieving state-of-the-art performance.

AI-generated summary

We introduce Diffusion World Model (DWM), a conditional diffusion model capable of predicting multistep future states and rewards concurrently. As opposed to traditional one-step dynamics models, DWM offers long-horizon predictions in a single forward pass, eliminating the need for recursive quires. We integrate DWM into model-based value estimation, where the short-term return is simulated by future trajectories sampled from DWM. In the context of offline reinforcement learning, DWM can be viewed as a conservative value regularization through generative modeling. Alternatively, it can be seen as a data source that enables offline Q-learning with synthetic data. Our experiments on the D4RL dataset confirm the robustness of DWM to long-horizon simulation. In terms of absolute performance, DWM significantly surpasses one-step dynamics models with a 44% performance gain, and achieves state-of-the-art performance.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2402.03570 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2402.03570 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2402.03570 in a Space README.md to link it from this page.

Collections including this paper 7

Лучший частный хостинг