lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend

\n","updatedAt":"2024-03-14T01:21:50.890Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7318189144134521},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"66650904c8779e7179a928f9","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142},"createdAt":"2024-06-09T01:44:36.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# Unlocking the Future of AI: Branch-Train-MiX (BTX) Explained\n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/2AUpnIkN4a3WrW7RUALk6.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

Unlocking the Future of AI: Branch-Train-MiX (BTX) Explained

\n

\n\n

Links 🔗:

\n

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-09T01:44:36.193Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5209411382675171},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}},{"id":"6855145f9adeec13044f4d39","author":{"_id":"6848efd865b3a0bf33d1cd68","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/jbGx4HFwd9-ylaUFl0E35.png","fullname":"zhangtiangang","name":"ztg-cv","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false},"createdAt":"2025-06-20T07:57:19.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"@librarian-bot recommend\n","html":"

\n\n@librarian-bot\n\t recommend

\n","updatedAt":"2025-06-20T07:57:19.128Z","author":{"_id":"6848efd865b3a0bf33d1cd68","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/jbGx4HFwd9-ylaUFl0E35.png","fullname":"zhangtiangang","name":"ztg-cv","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7918877601623535},"editors":["ztg-cv"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/no-auth/jbGx4HFwd9-ylaUFl0E35.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2403.07816","authors":[{"_id":"65f114b632c29f9fb7031dfa","user":{"_id":"66a8611eb51510d82ed54231","avatarUrl":"/avatars/ad559e774fee4914091b82c9831ae2a2.svg","isPro":false,"fullname":"Sainbayar Sukhbaatar","user":"sainbar","type":"user"},"name":"Sainbayar Sukhbaatar","status":"claimed_verified","statusLastChangedAt":"2024-07-30T16:01:22.831Z","hidden":false},{"_id":"65f114b632c29f9fb7031dfb","user":{"_id":"6318b1e06e6bd039440871d6","avatarUrl":"/avatars/2d9f96eb4092a9d1734dcc09303b259a.svg","isPro":false,"fullname":"Olga Golovneva","user":"Golovneva","type":"user"},"name":"Olga Golovneva","status":"admin_assigned","statusLastChangedAt":"2024-03-13T13:32:59.410Z","hidden":false},{"_id":"65f114b632c29f9fb7031dfc","user":{"_id":"65492730a3d1682f79f3ab7a","avatarUrl":"/avatars/fc62605bdf10947dab393305c51ace96.svg","isPro":false,"fullname":"Vasu Sharma","user":"vasusharma55","type":"user"},"name":"Vasu Sharma","status":"admin_assigned","statusLastChangedAt":"2024-03-13T13:33:23.695Z","hidden":false},{"_id":"65f114b632c29f9fb7031dfd","name":"Hu Xu","hidden":false},{"_id":"65f114b632c29f9fb7031dfe","user":{"_id":"61df349e34b589fbaf2e93c4","avatarUrl":"/avatars/9dd7776d3b610d78e40c2e178ab2f112.svg","isPro":false,"fullname":"Victoria Lin","user":"VictoriaLinML","type":"user"},"name":"Xi Victoria Lin","status":"admin_assigned","statusLastChangedAt":"2024-03-13T13:34:11.498Z","hidden":false},{"_id":"65f114b632c29f9fb7031dff","user":{"_id":"62d063dac375d0c84255b9a1","avatarUrl":"/avatars/de0fc34bad8c761210c0895ebfa4feba.svg","isPro":false,"fullname":"Baptiste Roziere","user":"broz","type":"user"},"name":"Baptiste Rozière","status":"admin_assigned","statusLastChangedAt":"2024-03-13T13:34:21.092Z","hidden":false},{"_id":"65f114b632c29f9fb7031e00","user":{"_id":"65f1eac292b7b4b9d59cb886","avatarUrl":"/avatars/e7adb8a06860663c828501460cc092ae.svg","isPro":false,"fullname":"Jacob Kahn","user":"jacobkahn","type":"user"},"name":"Jacob Kahn","status":"claimed_verified","statusLastChangedAt":"2024-03-13T19:44:10.442Z","hidden":false},{"_id":"65f114b632c29f9fb7031e01","name":"Daniel Li","hidden":false},{"_id":"65f114b632c29f9fb7031e02","name":"Wen-tau Yih","hidden":false},{"_id":"65f114b632c29f9fb7031e03","user":{"_id":"62f023a36a027498eaa2f9cc","avatarUrl":"/avatars/8ac1c5c74d0957e3c6cc94b3a7795c37.svg","isPro":false,"fullname":"Jason Weston","user":"spermwhale","type":"user"},"name":"Jason Weston","status":"admin_assigned","statusLastChangedAt":"2024-03-13T13:34:53.258Z","hidden":false},{"_id":"65f114b632c29f9fb7031e04","name":"Xian Li","hidden":false}],"publishedAt":"2024-03-12T16:54:58.000Z","submittedOnDailyAt":"2024-03-13T01:21:34.750Z","title":"Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"We investigate efficient methods for training Large Language Models (LLMs) to\npossess capabilities in multiple specialized domains, such as coding, math\nreasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts\nfrom a seed model, which is branched to train experts in embarrassingly\nparallel fashion with high throughput and reduced communication cost. After\nindividual experts are asynchronously trained, BTX brings together their\nfeedforward parameters as experts in Mixture-of-Expert (MoE) layers and\naverages the remaining parameters, followed by an MoE-finetuning stage to learn\ntoken-level routing. BTX generalizes two special cases, the Branch-Train-Merge\nmethod, which does not have the MoE finetuning stage to learn routing, and\nsparse upcycling, which omits the stage of training experts asynchronously.\nCompared to alternative approaches, BTX achieves the best accuracy-efficiency\ntradeoff.","upvotes":44,"discussionId":"65f114b632c29f9fb7031e38","ai_summary":"Branch-Train-MiX (BTX) method enhances Large Language Models by asynchronously training experts in parallel and integrating them using Mixture-of-Expert layers with token-level routing for improved accuracy and efficiency.","ai_keywords":["Large Language Models","Branch-Train-MiX","BTX","seed model","embarrassingly parallel","Mixture-of-Expert","MoE layers","token-level routing","Branch-Train-Merge","sparse upcycling","accuracy-efficiency tradeoff"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"630c2ddb86b8b9904c3860a6","avatarUrl":"/avatars/9b6cec2e9e269ccac1533eb7bf1ac2c5.svg","isPro":false,"fullname":"Igor Melnyk","user":"imelnyk","type":"user"},{"_id":"635fd74e14657fb8cff2bc13","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/635fd74e14657fb8cff2bc13/lUlHB0z1CRPJpwwT3JcnO.jpeg","isPro":false,"fullname":"Chan Kim","user":"chanmuzi","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"60078446e55258e41786a959","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60078446e55258e41786a959/UGPCE4YqG9BVMSf0YauxL.png","isPro":true,"fullname":"Motoki Wu","user":"tokestermw","type":"user"},{"_id":"6033c55f60e3dd96631c908d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6033c55f60e3dd96631c908d/jy7cHHCBhnlzHKGbXIbj0.jpeg","isPro":false,"fullname":"Shyam Sunder Kumar","user":"theainerd","type":"user"},{"_id":"640dd3e0c364a086c6322ad2","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/640dd3e0c364a086c6322ad2/z11CvZcbWGSRtWXzvem6I.png","isPro":false,"fullname":"Daniel Wang","user":"DanielWang","type":"user"},{"_id":"6419d46b9a27800807c43fe3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6419d46b9a27800807c43fe3/H99LfQaSRU3c6uHHoGWPj.jpeg","isPro":false,"fullname":"MoonRide","user":"MoonRide","type":"user"},{"_id":"63c94ede00104ea998de19a6","avatarUrl":"/avatars/273959d87f0c67747588cf0700d64039.svg","isPro":false,"fullname":"Alexandre Rame","user":"alexrame","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"650075987f91f9404003bc90","avatarUrl":"/avatars/86c5bd4a74d970177a78402cc98473a6.svg","isPro":false,"fullname":"Ulysselvr","user":"Ulysse38","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":3}">
Papers
arxiv:2403.07816

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Published on Mar 12, 2024
· Submitted by AK on Mar 13, 2024
#3 Paper of the day
Authors:
,
,
,

Abstract

Branch-Train-MiX (BTX) method enhances Large Language Models by asynchronously training experts in parallel and integrating them using Mixture-of-Expert layers with token-level routing for improved accuracy and efficiency.

AI-generated summary

We investigate efficient methods for training Large Language Models (LLMs) to possess capabilities in multiple specialized domains, such as coding, math reasoning and world knowledge. Our method, named Branch-Train-MiX (BTX), starts from a seed model, which is branched to train experts in embarrassingly parallel fashion with high throughput and reduced communication cost. After individual experts are asynchronously trained, BTX brings together their feedforward parameters as experts in Mixture-of-Expert (MoE) layers and averages the remaining parameters, followed by an MoE-finetuning stage to learn token-level routing. BTX generalizes two special cases, the Branch-Train-Merge method, which does not have the MoE finetuning stage to learn routing, and sparse upcycling, which omits the stage of training experts asynchronously. Compared to alternative approaches, BTX achieves the best accuracy-efficiency tradeoff.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Unlocking the Future of AI: Branch-Train-MiX (BTX) Explained

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

@librarian-bot recommend

Sign up or log in to comment

Models citing this paper 1

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2403.07816 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2403.07816 in a Space README.md to link it from this page.

Collections including this paper 24

Лучший частный хостинг