\n","updatedAt":"2025-06-17T02:22:00.806Z","author":{"_id":"635364b3c41f548fe39db945","avatarUrl":"/avatars/ad1916bbfabca0b6651c8eabacc5eba8.svg","fullname":"Runpeng Yu","name":"rp-yu","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":9}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4093720316886902},"editors":["rp-yu"],"editorAvatarUrls":["/avatars/ad1916bbfabca0b6651c8eabacc5eba8.svg"],"reactions":[],"isReport":false}},{"id":"68536ab7e3278bb954ee2c6b","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2025-06-19T01:41:11.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [Dimple: Discrete Diffusion Multimodal Large Language Model with Parallel Decoding](https://huggingface.co/papers/2505.16990) (2025)\n* [FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities](https://huggingface.co/papers/2505.20147) (2025)\n* [LaViDa: A Large Diffusion Language Model for Multimodal Understanding](https://huggingface.co/papers/2505.16839) (2025)\n* [MMaDA: Multimodal Large Diffusion Language Models](https://huggingface.co/papers/2505.15809) (2025)\n* [Anchored Diffusion Language Model](https://huggingface.co/papers/2505.18456) (2025)\n* [Esoteric Language Models](https://huggingface.co/papers/2506.01928) (2025)\n* [LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning](https://huggingface.co/papers/2505.16933) (2025)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2025-06-19T01:41:11.507Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6783634424209595},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2506.13759","authors":[{"_id":"6850ccab5e07650ecce88fd7","name":"Runpeng Yu","hidden":false},{"_id":"6850ccab5e07650ecce88fd8","name":"Qi Li","hidden":false},{"_id":"6850ccab5e07650ecce88fd9","name":"Xinchao Wang","hidden":false}],"publishedAt":"2025-06-16T17:59:08.000Z","submittedOnDailyAt":"2025-06-17T00:51:24.409Z","title":"Discrete Diffusion in Large Language and Multimodal Models: A Survey","submittedOnDailyBy":{"_id":"635364b3c41f548fe39db945","avatarUrl":"/avatars/ad1916bbfabca0b6651c8eabacc5eba8.svg","isPro":false,"fullname":"Runpeng Yu","user":"rp-yu","type":"user"},"summary":"In this work, we provide a systematic survey of Discrete Diffusion Language\nModels (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs).\nUnlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token,\nparallel decoding paradigm using full attention and a denoising-based\ngeneration strategy. This paradigm naturally enables parallel generation,\nfine-grained output controllability, and dynamic, response-aware perception.\nThese capabilities are previously difficult to achieve with AR models.\nRecently, a growing number of industrial-scale proprietary d(M)LLMs, as well as\na large number of open-source academic d(M)LLMs, have demonstrated performance\ncomparable to their autoregressive counterparts, while achieving up to 10x\nacceleration in inference speed.\n The advancement of discrete diffusion LLMs and MLLMs has been largely driven\nby progress in two domains. The first is the development of autoregressive LLMs\nand MLLMs, which has accumulated vast amounts of data, benchmarks, and\nfoundational infrastructure for training and inference. The second contributing\ndomain is the evolution of the mathematical models underlying discrete\ndiffusion. Together, these advancements have catalyzed a surge in dLLMs and\ndMLLMs research in early 2025.\n In this work, we present a comprehensive overview of the research in the dLLM\nand dMLLM domains. We trace the historical development of dLLMs and dMLLMs,\nformalize the underlying mathematical frameworks, and categorize representative\nmodels. We further analyze key techniques for training and inference, and\nsummarize emerging applications across language, vision-language, and\nbiological domains. We conclude by discussing future directions for research\nand deployment.\n Paper collection: https://github.com/LiQiiiii/DLLM-Survey","upvotes":43,"discussionId":"6850ccab5e07650ecce88fda","githubRepo":"https://github.com/LiQiiiii/DLLM-Survey","ai_summary":"Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs) enable parallel generation and faster inference compared to autoregressive models through denoising-based strategies and full attention mechanisms.","ai_keywords":["Discrete Diffusion Language Models","Discrete Diffusion Multimodal Language Models","autoregressive models","multi-token","parallel decoding","full attention","denoising-based generation","response-aware perception","inference speed","autoregressive LLMs","autoregressive MLLMs","mathematical models","historical development","training","inference","language applications","vision-language applications","biological applications","future research directions","deployment"],"githubStars":292},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"635364b3c41f548fe39db945","avatarUrl":"/avatars/ad1916bbfabca0b6651c8eabacc5eba8.svg","isPro":false,"fullname":"Runpeng Yu","user":"rp-yu","type":"user"},{"_id":"5df833bdda6d0311fd3d5403","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5df833bdda6d0311fd3d5403/62OtGJEQXdOuhV9yCd4HS.png","isPro":false,"fullname":"Weihao Yu","user":"whyu","type":"user"},{"_id":"646350107e9025b09bd62bab","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646350107e9025b09bd62bab/TEOf1dZnZLE-4_-I6Eh-n.jpeg","isPro":false,"fullname":"momo","user":"wzc991222","type":"user"},{"_id":"640ebdfefdeaae139086f4d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640ebdfefdeaae139086f4d8/2N94gbHubplYD8njmUTPf.jpeg","isPro":true,"fullname":"Yuanshi","user":"Yuanshi","type":"user"},{"_id":"6627cccfded9b7936d5d1d21","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6627cccfded9b7936d5d1d21/LKGr7EP7AirjmkbFZSn4o.jpeg","isPro":true,"fullname":"Guangnian Wan","user":"bigglesworthnotcat","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"6761a439b4b56a823c8bce1b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/fUKU8eOtdru96Lfy5gFAa.png","isPro":false,"fullname":"liu","user":"linlinlinlinsdada","type":"user"},{"_id":"643a6e89a856622f9788bf67","avatarUrl":"/avatars/419c0379f072295b27d4bfe2f8fb946d.svg","isPro":false,"fullname":"qiuhong shen","user":"florinshum","type":"user"},{"_id":"64396ebc21221ac7411852b3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64396ebc21221ac7411852b3/SR0dC8N0bdj9tZFxYPpSf.jpeg","isPro":false,"fullname":"Xinyin Ma","user":"horseee","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"6270324ebecab9e2dcf245de","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6270324ebecab9e2dcf245de/cMbtWSasyNlYc9hvsEEzt.jpeg","isPro":false,"fullname":"Kye Gomez","user":"kye","type":"user"},{"_id":"631c386bc73939ffc0716a37","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662793811119-noauth.jpeg","isPro":false,"fullname":"SeongWan Kim","user":"idgmatrix","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs) enable parallel generation and faster inference compared to autoregressive models through denoising-based strategies and full attention mechanisms.
AI-generated summary
In this work, we provide a systematic survey of Discrete Diffusion Language
Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs).
Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token,
parallel decoding paradigm using full attention and a denoising-based
generation strategy. This paradigm naturally enables parallel generation,
fine-grained output controllability, and dynamic, response-aware perception.
These capabilities are previously difficult to achieve with AR models.
Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as
a large number of open-source academic d(M)LLMs, have demonstrated performance
comparable to their autoregressive counterparts, while achieving up to 10x
acceleration in inference speed.
The advancement of discrete diffusion LLMs and MLLMs has been largely driven
by progress in two domains. The first is the development of autoregressive LLMs
and MLLMs, which has accumulated vast amounts of data, benchmarks, and
foundational infrastructure for training and inference. The second contributing
domain is the evolution of the mathematical models underlying discrete
diffusion. Together, these advancements have catalyzed a surge in dLLMs and
dMLLMs research in early 2025.
In this work, we present a comprehensive overview of the research in the dLLM
and dMLLM domains. We trace the historical development of dLLMs and dMLLMs,
formalize the underlying mathematical frameworks, and categorize representative
models. We further analyze key techniques for training and inference, and
summarize emerging applications across language, vision-language, and
biological domains. We conclude by discussing future directions for research
and deployment.
Paper collection: https://github.com/LiQiiiii/DLLM-Survey