https://github.com/OpenAccess-AI-Collective/axolotl/pull/1567\n","updatedAt":"2024-04-28T18:13:40.983Z","author":{"_id":"5f0988ad19cb630495b8147a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f0988ad19cb630495b8147a/I5T-tPlEXzsXR7iI8d4u1.png","fullname":"Sayantan Das","name":"ucalyptus","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":21}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6888775825500488},"editors":["ucalyptus"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5f0988ad19cb630495b8147a/I5T-tPlEXzsXR7iI8d4u1.png"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2309.10400","authors":[{"_id":"650c75a972afb1e60e66baf0","user":{"_id":"64d2fce8129a210e569e0c76","avatarUrl":"/avatars/a79a832dc3a46ece1b9e542369fc4888.svg","isPro":false,"fullname":"Dawei Zhu","user":"dwzhu","type":"user"},"name":"Dawei Zhu","status":"claimed_verified","statusLastChangedAt":"2023-09-25T10:51:41.542Z","hidden":false},{"_id":"650c75a972afb1e60e66baf1","user":{"_id":"6341342a9948f573f36d77da","avatarUrl":"/avatars/813717cb180b2ce566421b43dadabb43.svg","isPro":false,"fullname":"nyanyanya","user":"nyanyanya","type":"user"},"name":"Nan Yang","status":"claimed_verified","statusLastChangedAt":"2024-05-13T07:51:00.933Z","hidden":false},{"_id":"650c75a972afb1e60e66baf2","name":"Liang Wang","hidden":false},{"_id":"650c75a972afb1e60e66baf3","name":"Yifan Song","hidden":false},{"_id":"650c75a972afb1e60e66baf4","name":"Wenhao Wu","hidden":false},{"_id":"650c75a972afb1e60e66baf5","name":"Furu Wei","hidden":false},{"_id":"650c75a972afb1e60e66baf6","name":"Sujian Li","hidden":false}],"publishedAt":"2023-09-19T08:03:38.000Z","title":"PoSE: Efficient Context Window Extension of LLMs via Positional\n Skip-wise Training","summary":"In this paper, we introduce Positional Skip-wisE (PoSE) training for\nefficient adaptation of large language models~(LLMs) to extremely long context\nwindows. PoSE decouples train length from target context window size by\nsimulating long inputs using a fixed context window with manipulated position\nindices during training. Concretely, we select several short chunks from a long\ninput sequence, and introduce distinct skipping bias terms to modify the\nposition indices of each chunk. These bias terms, along with the length of each\nchunk, are altered for each training example, allowing the model to adapt to\nall positions within the target context window without training on full length\ninputs. Experiments show that, compared with fine-tuning on the full length,\nPoSE greatly reduces memory and time overhead with minimal impact on\nperformance. Leveraging this advantage, we have successfully extended the LLaMA\nmodel to 128k tokens. Furthermore, we empirically confirm that PoSE is\ncompatible with all RoPE-based LLMs and various position interpolation\nstrategies. Notably, by decoupling fine-tuning length from target context\nwindow, PoSE can theoretically extend the context window infinitely,\nconstrained only by memory usage for inference. With ongoing advancements for\nefficient inference, we believe PoSE holds great promise for scaling the\ncontext window even further.","upvotes":26,"discussionId":"650c75a972afb1e60e66bb0c","ai_summary":"PoSE training enhances the efficiency of large language models in handling long context windows by using simulated long inputs with manipulated position indices during training.","ai_keywords":["Positional Skip-wisE","PoSE","large language models","LLMs","long context windows","position indices","skipping bias terms","RoPE-based LLMs","position interpolation strategies"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"64d2fce8129a210e569e0c76","avatarUrl":"/avatars/a79a832dc3a46ece1b9e542369fc4888.svg","isPro":false,"fullname":"Dawei Zhu","user":"dwzhu","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","isPro":false,"fullname":"Lee Gao","user":"leegao19","type":"user"},{"_id":"6412be271e42164b9f13f177","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6412be271e42164b9f13f177/SvtEKEd6aYtSJLHZ33Rsv.png","isPro":true,"fullname":"Sunyoung Hwang","user":"sosoai","type":"user"},{"_id":"5e67bdd61009063689407479","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1583857146757-5e67bdd61009063689407479.jpeg","isPro":true,"fullname":"Clem 🤗","user":"clem","type":"user"},{"_id":"63fb10e80aab060792f43a41","avatarUrl":"/avatars/3435130d60ac8a9dd65de77a69f2ad7b.svg","isPro":false,"fullname":"YUCHUL JUNG","user":"YUCHUL","type":"user"},{"_id":"63358001686c20e55973298d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1665133631770-63358001686c20e55973298d.png","isPro":false,"fullname":"Mathias Nielsen","user":"mathiasn1","type":"user"},{"_id":"60f2fc91b92afccb7c34b8ed","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/60f2fc91b92afccb7c34b8ed/W2-Nay12Ef4Ltyaf8EKE9.jpeg","isPro":false,"fullname":"Gabriel Martín Blázquez","user":"gabrielmbmb","type":"user"},{"_id":"64b6f362f92b20f7a3706509","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64b6f362f92b20f7a3706509/nN978OpGjd53V6-uL8LrB.jpeg","isPro":true,"fullname":"Edwin Santiago Alférez Baquero","user":"esab","type":"user"},{"_id":"5e3eb6f6a733e279f671b342","avatarUrl":"/avatars/d06eb52c971bd56b73c9a4a386aeb58e.svg","isPro":true,"fullname":"Cedric Chee","user":"cedric","type":"user"},{"_id":"63107b18e87051f3e3e0f598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63107b18e87051f3e3e0f598/R9onir4Y0MZuq1jEWCZ2-.jpeg","isPro":false,"fullname":"Unchun Yang","user":"ucyang","type":"user"},{"_id":"62e33241e5431c5d1ad3a6f0","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e33241e5431c5d1ad3a6f0/965ETvBJsVpA8Zh0sux2o.png","isPro":false,"fullname":"Barton Rhodes","user":"bmorphism","type":"user"}],"acceptLanguages":["*"]}">
PoSE training enhances the efficiency of large language models in handling long context windows by using simulated long inputs with manipulated position indices during training.
AI-generated summary
In this paper, we introduce Positional Skip-wisE (PoSE) training for
efficient adaptation of large language models~(LLMs) to extremely long context
windows. PoSE decouples train length from target context window size by
simulating long inputs using a fixed context window with manipulated position
indices during training. Concretely, we select several short chunks from a long
input sequence, and introduce distinct skipping bias terms to modify the
position indices of each chunk. These bias terms, along with the length of each
chunk, are altered for each training example, allowing the model to adapt to
all positions within the target context window without training on full length
inputs. Experiments show that, compared with fine-tuning on the full length,
PoSE greatly reduces memory and time overhead with minimal impact on
performance. Leveraging this advantage, we have successfully extended the LLaMA
model to 128k tokens. Furthermore, we empirically confirm that PoSE is
compatible with all RoPE-based LLMs and various position interpolation
strategies. Notably, by decoupling fine-tuning length from target context
window, PoSE can theoretically extend the context window infinitely,
constrained only by memory usage for inference. With ongoing advancements for
efficient inference, we believe PoSE holds great promise for scaling the
context window even further.