https://huggingface.co/TechxGenus/Mini-Jamba\n","updatedAt":"2024-04-01T16:54:28.469Z","author":{"_id":"63ff78e2b09f82a81a1d6f52","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63ff78e2b09f82a81a1d6f52/6SnqEfGDRH25sqJXAV4mf.png","fullname":"Sciumo","name":"Sciumo","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.8425146341323853},"editors":["Sciumo"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/63ff78e2b09f82a81a1d6f52/6SnqEfGDRH25sqJXAV4mf.png"],"reactions":[{"reaction":"🔥","users":["clem","emanuelevivoli"],"count":2}],"isReport":false}},{"id":"660b5e645814b3dfc3e6c94c","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2024-04-02T01:24:52.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [DenseMamba: State Space Models with Dense Hidden Connection for Efficient Large Language Models](https://huggingface.co/papers/2403.00818) (2024)\n* [Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference](https://huggingface.co/papers/2403.14520) (2024)\n* [Dynamic Memory Compression: Retrofitting LLMs for Accelerated Inference](https://huggingface.co/papers/2403.09636) (2024)\n* [ALISA: Accelerating Large Language Model Inference via Sparsity-Aware KV Caching](https://huggingface.co/papers/2403.17312) (2024)\n* [Can Mamba Learn How to Learn? A Comparative Study on In-Context Learning Tasks](https://huggingface.co/papers/2402.04248) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-04-02T01:24:52.598Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7239789962768555},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"66166e0a7dc4e8388b470590","author":{"_id":"648a210e9da3cc3506961585","avatarUrl":"/avatars/808e9d7ac99837fe79169d0b8d49c366.svg","fullname":"Ajith V Prabhakar","name":"ajithprabhakar","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2},"createdAt":"2024-04-10T10:46:34.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"I have featured this paper on my blog. You can read it at https://rb.gy/6r0rs3","html":"
\n","updatedAt":"2024-06-08T20:55:05.746Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5135638117790222},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.19887","authors":[{"_id":"660a2edb40e346fba5b51b82","user":{"_id":"64e1bffd87a7332d25b79540","avatarUrl":"/avatars/2205afcba52a00e51517a30448693c65.svg","isPro":false,"fullname":"Opher Lieber","user":"opherlieber","type":"user"},"name":"Opher Lieber","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:34:41.626Z","hidden":false},{"_id":"660a2edb40e346fba5b51b83","name":"Barak Lenz","hidden":false},{"_id":"660a2edb40e346fba5b51b84","name":"Hofit Bata","hidden":false},{"_id":"660a2edb40e346fba5b51b85","user":{"_id":"635aa711b13777d661f99a71","avatarUrl":"/avatars/78409dd88639cb9e74c055223b7540d1.svg","isPro":false,"fullname":"Gal Cohen","user":"galco","type":"user"},"name":"Gal Cohen","status":"claimed_verified","statusLastChangedAt":"2024-04-02T11:04:25.248Z","hidden":false},{"_id":"660a2edb40e346fba5b51b86","name":"Jhonathan Osin","hidden":false},{"_id":"660a2edb40e346fba5b51b87","name":"Itay Dalmedigos","hidden":false},{"_id":"660a2edb40e346fba5b51b88","name":"Erez Safahi","hidden":false},{"_id":"660a2edb40e346fba5b51b89","user":{"_id":"65e7310d2519cf95214030fa","avatarUrl":"/avatars/0c3124e8c8b40c4233244bac0ef605f1.svg","isPro":false,"fullname":"Shaked Meirom","user":"ShakedM","type":"user"},"name":"Shaked Meirom","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:35:15.365Z","hidden":false},{"_id":"660a2edb40e346fba5b51b8a","user":{"_id":"614c57f1ee44bcfe57b366d6","avatarUrl":"/avatars/186a9aed84681246f48ed2a012c50def.svg","isPro":false,"fullname":"Yonatan Belinkov","user":"belinkov","type":"user"},"name":"Yonatan Belinkov","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:35:22.149Z","hidden":false},{"_id":"660a2edb40e346fba5b51b8b","name":"Shai Shalev-Shwartz","hidden":false},{"_id":"660a2edb40e346fba5b51b8c","user":{"_id":"65375d628e37b02865e00265","avatarUrl":"/avatars/ede8d84f60645512a6c22bbf3b6ade74.svg","isPro":false,"fullname":"Omri Abend","user":"omriabnd","type":"user"},"name":"Omri Abend","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:35:35.699Z","hidden":false},{"_id":"660a2edb40e346fba5b51b8d","user":{"_id":"65a7ca95e5ddfd7d1d97a6a0","avatarUrl":"/avatars/84ebb95a07dd884e34f0170b07b1d652.svg","isPro":false,"fullname":"Raz Alon","user":"RazAlon","type":"user"},"name":"Raz Alon","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:35:42.209Z","hidden":false},{"_id":"660a2edb40e346fba5b51b8e","user":{"_id":"65f2d8dfab7d7db7f5d78900","avatarUrl":"/avatars/feaa4973b6a6741c159bb2bc94d1ba59.svg","isPro":false,"fullname":"Tomer Asida","user":"tomeras1","type":"user"},"name":"Tomer Asida","status":"claimed_verified","statusLastChangedAt":"2024-04-02T11:04:16.537Z","hidden":false},{"_id":"660a2edb40e346fba5b51b8f","user":{"_id":"65fbfed32c813664be51d24b","avatarUrl":"/avatars/acc6f4c5968003f7e2a587616fffd069.svg","isPro":false,"fullname":"Amir Bergman","user":"amirbe","type":"user"},"name":"Amir Bergman","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:35:49.115Z","hidden":false},{"_id":"660a2edb40e346fba5b51b90","name":"Roman Glozman","hidden":false},{"_id":"660a2edb40e346fba5b51b91","user":{"_id":"60ede03f36e9ceda399a423e","avatarUrl":"/avatars/da4794bfca8c430d271fcb50a19da28f.svg","isPro":false,"fullname":"Michael Gokhman","user":"michael-go","type":"user"},"name":"Michael Gokhman","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:35:59.812Z","hidden":false},{"_id":"660a2edb40e346fba5b51b92","name":"Avashalom Manevich","hidden":false},{"_id":"660a2edb40e346fba5b51b93","user":{"_id":"646630317ff8fcbef7d35079","avatarUrl":"/avatars/a0650f7e9d8eb3cb01ba16fdf4616360.svg","isPro":false,"fullname":"Nir Ratner","user":"Nirer","type":"user"},"name":"Nir Ratner","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:36:08.655Z","hidden":false},{"_id":"660a2edb40e346fba5b51b94","name":"Noam Rozen","hidden":false},{"_id":"660a2edb40e346fba5b51b95","name":"Erez Shwartz","hidden":false},{"_id":"660a2edb40e346fba5b51b96","user":{"_id":"65e6d33bacf5bc4e4e18a1c1","avatarUrl":"/avatars/b9325f34fd3275782b1ae487ec08d9a2.svg","isPro":false,"fullname":"Mor Zusman","user":"mzusman","type":"user"},"name":"Mor Zusman","status":"admin_assigned","statusLastChangedAt":"2024-04-25T07:36:21.962Z","hidden":false},{"_id":"660a2edb40e346fba5b51b97","name":"Yoav Shoham","hidden":false}],"publishedAt":"2024-03-28T23:55:06.000Z","submittedOnDailyAt":"2024-04-01T02:19:48.032Z","title":"Jamba: A Hybrid Transformer-Mamba Language Model","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We present Jamba, a new base large language model based on a novel hybrid\nTransformer-Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba\ninterleaves blocks of Transformer and Mamba layers, enjoying the benefits of\nboth model families. MoE is added in some of these layers to increase model\ncapacity while keeping active parameter usage manageable. This flexible\narchitecture allows resource- and objective-specific configurations. In the\nparticular configuration we have implemented, we end up with a powerful model\nthat fits in a single 80GB GPU. Built at large scale, Jamba provides high\nthroughput and small memory footprint compared to vanilla Transformers, and at\nthe same time state-of-the-art performance on standard language model\nbenchmarks and long-context evaluations. Remarkably, the model presents strong\nresults for up to 256K tokens context length. We study various architectural\ndecisions, such as how to combine Transformer and Mamba layers, and how to mix\nexperts, and show that some of them are crucial in large scale modeling. We\nalso describe several interesting properties of these architectures which the\ntraining and evaluation of Jamba have revealed, and plan to release checkpoints\nfrom various ablation runs, to encourage further exploration of this novel\narchitecture. We make the weights of our implementation of Jamba publicly\navailable under a permissive license.","upvotes":111,"discussionId":"660a2edc40e346fba5b51bb4","ai_summary":"Jamba, a hybrid Transformer-Mamba MoE model, achieves state-of-the-art performance with high throughput and minimal memory usage, supporting large context lengths.","ai_keywords":["Transformer","Mamba","mixture-of-experts (MoE)","interleaving layers","model capacity","active parameter usage","resource-specific configurations","memory footprint","long-context evaluations","ablation runs"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6101c620900eaa0057c2ce1d","avatarUrl":"/avatars/bd282166c120711c65b5409dc860ac58.svg","isPro":false,"fullname":"Abdel-Dayane Marcos","user":"admarcosai","type":"user"},{"_id":"62deb6c3520a9fae78bb9bc3","avatarUrl":"/avatars/5d75fffa9bad36d20adb8f47141d1f0b.svg","isPro":false,"fullname":"Literate Goggles","user":"literate-goggles","type":"user"},{"_id":"64bbe9b236eb058cd9d6a5b9","avatarUrl":"/avatars/c7c01a3fa8809e73800392679abff6d5.svg","isPro":false,"fullname":"Kai Zuberbühler","user":"kaizuberbuehler","type":"user"},{"_id":"64ca7c04710645aa7bdbbfff","avatarUrl":"/avatars/c12f4cb6dc1ff0010edb3ef4cfcccd7c.svg","isPro":false,"fullname":"Lize Pirenne","user":"Inversta","type":"user"},{"_id":"632289a7909ac44b572bf51a","avatarUrl":"/avatars/7ebca310419ea7c3e0f5a4b57fbbdf6e.svg","isPro":false,"fullname":"galen","user":"AlphaGalen","type":"user"},{"_id":"64316678dec2a70d8130aa9d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/EAS7OJwvyInle8J7IIBbw.jpeg","isPro":true,"fullname":"Levi Sverdlov","user":"Sverd","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"65ef656628d893484b45d61c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/65ef656628d893484b45d61c/VOjfgUSxycnM3nNkHxGqs.jpeg","isPro":false,"fullname":"Daniel Roich","user":"roichdaniel","type":"user"},{"_id":"614a20dc26e73aded3219bd3","avatarUrl":"/avatars/0e8f9e5293feb1792ee2ad1a8cf14051.svg","isPro":true,"fullname":"MOHAMMED ABDALLAH","user":"melsiddieg","type":"user"},{"_id":"63486df1f8f01fcc4b23e97d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63486df1f8f01fcc4b23e97d/RDpX29ibKTJhgisgtvZ6M.png","isPro":false,"fullname":"Satyam","user":"satyamt","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"62a4ac6fd83c3facafa50892","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62a4ac6fd83c3facafa50892/qFpobw9B5XaLZvwn0XbmB.jpeg","isPro":false,"fullname":"Mohammed Brıman","user":"mohammedbriman","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":1}">
Jamba, a hybrid Transformer-Mamba MoE model, achieves state-of-the-art performance with high throughput and minimal memory usage, supporting large context lengths.
AI-generated summary
We present Jamba, a new base large language model based on a novel hybrid
Transformer-Mambamixture-of-experts (MoE) architecture. Specifically, Jamba
interleaves blocks of Transformer and Mamba layers, enjoying the benefits of
both model families. MoE is added in some of these layers to increase model
capacity while keeping active parameter usage manageable. This flexible
architecture allows resource- and objective-specific configurations. In the
particular configuration we have implemented, we end up with a powerful model
that fits in a single 80GB GPU. Built at large scale, Jamba provides high
throughput and small memory footprint compared to vanilla Transformers, and at
the same time state-of-the-art performance on standard language model
benchmarks and long-context evaluations. Remarkably, the model presents strong
results for up to 256K tokens context length. We study various architectural
decisions, such as how to combine Transformer and Mamba layers, and how to mix
experts, and show that some of them are crucial in large scale modeling. We
also describe several interesting properties of these architectures which the
training and evaluation of Jamba have revealed, and plan to release checkpoints
from various ablation runs, to encourage further exploration of this novel
architecture. We make the weights of our implementation of Jamba publicly
available under a permissive license.