👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/\n\n","updatedAt":"2024-06-09T05:08:14.778Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.5150529742240906},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.03462","authors":[{"_id":"659cc210c80023a02e24bd54","user":{"_id":"603fb3ebabc4bc39998ff765","avatarUrl":"/avatars/1a20dec6a1e48017f9a650ac2e510b4e.svg","isPro":false,"fullname":"NaN","user":"namespace-Pt","type":"user"},"name":"Peitian Zhang","status":"claimed_verified","statusLastChangedAt":"2024-01-09T09:16:51.563Z","hidden":false},{"_id":"659cc210c80023a02e24bd55","user":{"_id":"64a38c590111d5ff6c3d5f2b","avatarUrl":"/avatars/ef13dc7ce243819bc0da9b04e778b432.svg","isPro":false,"fullname":"zhengliu","user":"zl101","type":"user"},"name":"Zheng Liu","status":"claimed_verified","statusLastChangedAt":"2024-01-25T08:04:55.862Z","hidden":false},{"_id":"659cc210c80023a02e24bd56","user":{"_id":"62612679bbcbd1c34f1638af","avatarUrl":"/avatars/c0675d05a52192ee14e9ab1633353956.svg","isPro":false,"fullname":"Xiao","user":"Shitao","type":"user"},"name":"Shitao Xiao","status":"admin_assigned","statusLastChangedAt":"2024-01-09T08:22:34.399Z","hidden":false},{"_id":"659cc210c80023a02e24bd57","user":{"_id":"653272210917932040597047","avatarUrl":"/avatars/877a19686dd9780129cfcb85533e2e66.svg","isPro":false,"fullname":"Ninglu Shao","user":"rainym00d","type":"user"},"name":"Ninglu Shao","status":"admin_assigned","statusLastChangedAt":"2024-01-09T08:22:44.167Z","hidden":false},{"_id":"659cc210c80023a02e24bd58","user":{"_id":"62e9de1b4fa4bc6de6c36b7c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62e9de1b4fa4bc6de6c36b7c/JbLi6POVPaTSZGcF-9ddQ.jpeg","isPro":false,"fullname":"Qiwei Ye","user":"aeros0ul","type":"user"},"name":"Qiwei Ye","status":"admin_assigned","statusLastChangedAt":"2024-01-09T08:22:51.963Z","hidden":false},{"_id":"659cc210c80023a02e24bd59","user":{"_id":"66f0bf59e9d50ec57febf751","avatarUrl":"/avatars/be97941e60064e5dd806c6fe9db3c537.svg","isPro":false,"fullname":"Zhicheng Dou","user":"douzc","type":"user"},"name":"Zhicheng Dou","status":"claimed_verified","statusLastChangedAt":"2024-09-23T06:59:35.122Z","hidden":false}],"publishedAt":"2024-01-07T11:57:40.000Z","submittedOnDailyAt":"2024-01-09T01:18:33.134Z","title":"Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"The utilization of long contexts poses a big challenge for large language\nmodels due to their limited context window length. Although the context window\ncan be extended through fine-tuning, it will result in a considerable cost at\nboth training and inference time, and exert an unfavorable impact to the LLM's\noriginal capabilities. In this work, we propose Activation Beacon, which\ncondenses LLM's raw activations into more compact forms such that it can\nperceive a much longer context with a limited context window. Activation Beacon\nis introduced as a plug-and-play module for the LLM. It fully preserves the\nLLM's original capability on short contexts while extending the new capability\non processing longer contexts. Besides, it works with short sliding windows to\nprocess the long context, which achieves a competitive memory and time\nefficiency in both training and inference. Activation Beacon is learned by the\nauto-regression task conditioned on a mixture of beacons with diversified\ncondensing ratios. Thanks to such a treatment, it can be efficiently trained\npurely with short-sequence data in just 10K steps, which consumes less than 9\nhours on a single 8xA800 GPU machine. The experimental studies show that\nActivation Beacon is able to extend Llama-2-7B's context length by times100\ntimes (from 4K to 400K), meanwhile achieving a superior result on both\nlong-context generation and understanding tasks. Our model and code will be\navailable at the BGE repository.","upvotes":27,"discussionId":"659cc211c80023a02e24bd71","ai_summary":"Activation Beacon enhances large language models' ability to process long contexts using compact activation forms, maintaining original capabilities and improving efficiency.","ai_keywords":["long contexts","large language models","context window","fine-tuning","activation beacon","raw activations","plug-and-play module","short contexts","short sliding windows","memory efficiency","time efficiency","auto-regression task","diversified condensing ratios","BGE repository"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"62612679bbcbd1c34f1638af","avatarUrl":"/avatars/c0675d05a52192ee14e9ab1633353956.svg","isPro":false,"fullname":"Xiao","user":"Shitao","type":"user"},{"_id":"603fb3ebabc4bc39998ff765","avatarUrl":"/avatars/1a20dec6a1e48017f9a650ac2e510b4e.svg","isPro":false,"fullname":"NaN","user":"namespace-Pt","type":"user"},{"_id":"653272210917932040597047","avatarUrl":"/avatars/877a19686dd9780129cfcb85533e2e66.svg","isPro":false,"fullname":"Ninglu Shao","user":"rainym00d","type":"user"},{"_id":"65953f3a078efa3255dae953","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/26H96hHlCmCf8a31TKV48.jpeg","isPro":false,"fullname":"Mosaab Muhammad","user":"Mosaabx","type":"user"},{"_id":"651d618a18be7acf8e602c41","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/kEDoJKsGXpNDTOiU7FRMP.jpeg","isPro":false,"fullname":"Abreu Magalhães","user":"Hildeberto","type":"user"},{"_id":"63b6f2e752c02ae8acbaa4d8","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1672934038280-noauth.jpeg","isPro":false,"fullname":"Habibullah Akbar","user":"ChavyvAkvar","type":"user"},{"_id":"6039478ab3ecf716b1a5fd4d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6039478ab3ecf716b1a5fd4d/_Thy4E7taiSYBLKxEKJbT.jpeg","isPro":true,"fullname":"taesiri","user":"taesiri","type":"user"},{"_id":"64ca7c04710645aa7bdbbfff","avatarUrl":"/avatars/c12f4cb6dc1ff0010edb3ef4cfcccd7c.svg","isPro":false,"fullname":"Lize Pirenne","user":"Inversta","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"63a7024cc46291f273a72a08","avatarUrl":"/avatars/eb602654a78ea72972963266547cf334.svg","isPro":false,"fullname":"ds","user":"sds20","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon
Abstract
Activation Beacon enhances large language models' ability to process long contexts using compact activation forms, maintaining original capabilities and improving efficiency.
The utilization of long contexts poses a big challenge for large language models due to their limited context window length. Although the context window can be extended through fine-tuning, it will result in a considerable cost at both training and inference time, and exert an unfavorable impact to the LLM's original capabilities. In this work, we propose Activation Beacon, which condenses LLM's raw activations into more compact forms such that it can perceive a much longer context with a limited context window. Activation Beacon is introduced as a plug-and-play module for the LLM. It fully preserves the LLM's original capability on short contexts while extending the new capability on processing longer contexts. Besides, it works with short sliding windows to process the long context, which achieves a competitive memory and time efficiency in both training and inference. Activation Beacon is learned by the auto-regression task conditioned on a mixture of beacons with diversified condensing ratios. Thanks to such a treatment, it can be efficiently trained purely with short-sequence data in just 10K steps, which consumes less than 9 hours on a single 8xA800 GPU machine. The experimental studies show that Activation Beacon is able to extend Llama-2-7B's context length by times100 times (from 4K to 400K), meanwhile achieving a superior result on both long-context generation and understanding tasks. Our model and code will be available at the BGE repository.
Community
Unlocking 400K Token Contexts in LLMs with Activation Beacon!
Links 🔗:
👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/
Models citing this paper 52
Browse 52 models citing this paperDatasets citing this paper 0
No dataset linking this paper