👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/\n\n","updatedAt":"2024-06-09T07:19:35.834Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5331369638442993},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"1711.00937","authors":[{"_id":"6411c77f6b75ddced3890972","name":"Aaron van den Oord","hidden":false},{"_id":"6411c77f6b75ddced3890973","name":"Oriol Vinyals","hidden":false},{"_id":"6411c77f6b75ddced3890974","name":"Koray Kavukcuoglu","hidden":false}],"publishedAt":"2017-11-02T21:14:44.000Z","title":"Neural Discrete Representation Learning","summary":"Learning useful representations without supervision remains a key challenge\nin machine learning. In this paper, we propose a simple yet powerful generative\nmodel that learns such discrete representations. Our model, the Vector\nQuantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways:\nthe encoder network outputs discrete, rather than continuous, codes; and the\nprior is learnt rather than static. In order to learn a discrete latent\nrepresentation, we incorporate ideas from vector quantisation (VQ). Using the\nVQ method allows the model to circumvent issues of \"posterior collapse\" --\nwhere the latents are ignored when they are paired with a powerful\nautoregressive decoder -- typically observed in the VAE framework. Pairing\nthese representations with an autoregressive prior, the model can generate high\nquality images, videos, and speech as well as doing high quality speaker\nconversion and unsupervised learning of phonemes, providing further evidence of\nthe utility of the learnt representations.","upvotes":0,"discussionId":"641192353ea54b1aa7e2f2cc","ai_summary":"A Vector Quantised-Variational AutoEncoder learns discrete representations, enabling high-quality generation and unsupervised phoneme learning.","ai_keywords":["VQ-VAE","vector quantisation","posterior collapse","VAE","autoregressive prior","discrete latent representation","high-quality generation","speaker conversion","unsupervised learning","phonemes"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[],"acceptLanguages":["*"]}">
Abstract
A Vector Quantised-Variational AutoEncoder learns discrete representations, enabling high-quality generation and unsupervised phoneme learning.
Learning useful representations without supervision remains a key challenge in machine learning. In this paper, we propose a simple yet powerful generative model that learns such discrete representations. Our model, the Vector Quantised-Variational AutoEncoder (VQ-VAE), differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, we incorporate ideas from vector quantisation (VQ). Using the VQ method allows the model to circumvent issues of "posterior collapse" -- where the latents are ignored when they are paired with a powerful autoregressive decoder -- typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes, providing further evidence of the utility of the learnt representations.
Community
Unlocking the Power of Neural Discrete Representation Learning (VQ-VAE)
Links 🔗:
👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/
Models citing this paper 3
Datasets citing this paper 0
No dataset linking this paper