https://github.com/OpenAccess-AI-Collective/axolotl/pull/1567

\n","updatedAt":"2024-04-28T18:13:40.983Z","author":{"_id":"5f0988ad19cb630495b8147a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f0988ad19cb630495b8147a/I5T-tPlEXzsXR7iI8d4u1.png","fullname":"Sayantan Das","name":"ucalyptus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6888775825500488},"editors":["ucalyptus"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5f0988ad19cb630495b8147a/I5T-tPlEXzsXR7iI8d4u1.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2309.10400","authors":[{"_id":"650c75a972afb1e60e66baf0","user":{"_id":"64d2fce8129a210e569e0c76","avatarUrl":"/avatars/a79a832dc3a46ece1b9e542369fc4888.svg","isPro":false,"fullname":"Dawei Zhu","user":"dwzhu","type":"user"},"name":"Dawei Zhu","status":"claimed_verified","statusLastChangedAt":"2023-09-25T10:51:41.542Z","hidden":false},{"_id":"650c75a972afb1e60e66baf1","user":{"_id":"6341342a9948f573f36d77da","avatarUrl":"/avatars/813717cb180b2ce566421b43dadabb43.svg","isPro":false,"fullname":"nyanyanya","user":"nyanyanya","type":"user"},"name":"Nan Yang","status":"claimed_verified","statusLastChangedAt":"2024-05-13T07:51:00.933Z","hidden":false},{"_id":"650c75a972afb1e60e66baf2","name":"Liang Wang","hidden":false},{"_id":"650c75a972afb1e60e66baf3","name":"Yifan Song","hidden":false},{"_id":"650c75a972afb1e60e66baf4","name":"Wenhao Wu","hidden":false},{"_id":"650c75a972afb1e60e66baf5","name":"Furu Wei","hidden":false},{"_id":"650c75a972afb1e60e66baf6","name":"Sujian Li","hidden":false}],"publishedAt":"2023-09-19T08:03:38.000Z","title":"PoSE: Efficient Context Window Extension of LLMs via Positional\n Skip-wise Training","summary":"In this paper, we introduce Positional Skip-wisE (PoSE) training for\nefficient adaptation of large language models~(LLMs) to extremely long context\nwindows. PoSE decouples train length from target context window size by\nsimulating long inputs using a fixed context window with manipulated position\nindices during training. Concretely, we select several short chunks from a long\ninput sequence, and introduce distinct skipping bias terms to modify the\nposition indices of each chunk. These bias terms, along with the length of each\nchunk, are altered for each training example, allowing the model to adapt to\nall positions within the target context window without training on full length\ninputs. Experiments show that, compared with fine-tuning on the full length,\nPoSE greatly reduces memory and time overhead with minimal impact on\nperformance. Leveraging this advantage, we have successfully extended the LLaMA\nmodel to 128k tokens. Furthermore, we empirically confirm that PoSE is\ncompatible with all RoPE-based LLMs and various position interpolation\nstrategies. Notably, by decoupling fine-tuning length from target context\nwindow, PoSE can theoretically extend the context window infinitely,\nconstrained only by memory usage for inference. With ongoing advancements for\nefficient inference, we believe PoSE holds great promise for scaling the\ncontext window even further.","upvotes":26,"discussionId":"650c75a972afb1e60e66bb0c","ai_summary":"PoSE training enhances the efficiency of large language models in handling long context windows by using simulated long inputs with manipulated position indices during training.","ai_keywords":["Positional Skip-wisE","PoSE","large language models","LLMs","long context windows","position indices","skipping bias terms","RoPE-based LLMs","position interpolation strategies"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64d2fce8129a210e569e0c76","avatarUrl":"/avatars/a79a832dc3a46ece1b9e542369fc4888.svg","isPro":false,"fullname":"Dawei Zhu","user":"dwzhu","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","isPro":false,"fullname":"Lee Gao","user":"leegao19","type":"user"},{"_id":"6412be271e42164b9f13f177","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6412be271e42164b9f13f177/SvtEKEd6aYtSJLHZ33Rsv.png","isPro":true,"fullname":"Sunyoung Hwang","user":"sosoai","type":"user"},{"_id":"5e67bdd61009063689407479","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857146757-5e67bdd61009063689407479.jpeg","isPro":true,"fullname":"Clem 🤗","user":"clem","type":"user"},{"_id":"63fb10e80aab060792f43a41","avatarUrl":"/avatars/3435130d60ac8a9dd65de77a69f2ad7b.svg","isPro":false,"fullname":"YUCHUL JUNG","user":"YUCHUL","type":"user"},{"_id":"63358001686c20e55973298d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665133631770-63358001686c20e55973298d.png","isPro":false,"fullname":"Mathias Nielsen","user":"mathiasn1","type":"user"},{"_id":"60f2fc91b92afccb7c34b8ed","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60f2fc91b92afccb7c34b8ed/W2-Nay12Ef4Ltyaf8EKE9.jpeg","isPro":false,"fullname":"Gabriel Martín Blázquez","user":"gabrielmbmb","type":"user"},{"_id":"64b6f362f92b20f7a3706509","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b6f362f92b20f7a3706509/nN978OpGjd53V6-uL8LrB.jpeg","isPro":true,"fullname":"Edwin Santiago Alférez Baquero","user":"esab","type":"user"},{"_id":"5e3eb6f6a733e279f671b342","avatarUrl":"/avatars/d06eb52c971bd56b73c9a4a386aeb58e.svg","isPro":true,"fullname":"Cedric Chee","user":"cedric","type":"user"},{"_id":"63107b18e87051f3e3e0f598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63107b18e87051f3e3e0f598/R9onir4Y0MZuq1jEWCZ2-.jpeg","isPro":false,"fullname":"Unchun Yang","user":"ucyang","type":"user"},{"_id":"62e33241e5431c5d1ad3a6f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e33241e5431c5d1ad3a6f0/965ETvBJsVpA8Zh0sux2o.png","isPro":false,"fullname":"Barton Rhodes","user":"bmorphism","type":"user"}],"acceptLanguages":["*"]}">

arxiv:2309.10400

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Published on Sep 19, 2023

Upvote

Authors:

Dawei Zhu ,

Nan Yang ,

Abstract

PoSE training enhances the efficiency of large language models in handling long context windows by using simulated long inputs with manipulated position indices during training.

AI-generated summary

In this paper, we introduce Positional Skip-wisE (PoSE) training for efficient adaptation of large language models~(LLMs) to extremely long context windows. PoSE decouples train length from target context window size by simulating long inputs using a fixed context window with manipulated position indices during training. Concretely, we select several short chunks from a long input sequence, and introduce distinct skipping bias terms to modify the position indices of each chunk. These bias terms, along with the length of each chunk, are altered for each training example, allowing the model to adapt to all positions within the target context window without training on full length inputs. Experiments show that, compared with fine-tuning on the full length, PoSE greatly reduces memory and time overhead with minimal impact on performance. Leveraging this advantage, we have successfully extended the LLaMA model to 128k tokens. Furthermore, we empirically confirm that PoSE is compatible with all RoPE-based LLMs and various position interpolation strategies. Notably, by decoupling fine-tuning length from target context window, PoSE can theoretically extend the context window infinitely, constrained only by memory usage for inference. With ongoing advancements for efficient inference, we believe PoSE holds great promise for scaling the context window even further.