The following papers were recommended by the Semantic Scholar API
\n- \n
- LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning (2024) \n
- Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention (2023) \n
- E^2-LLM: Efficient and Extreme Length Extension of Large Language Models (2024) \n
- Extending Context Window of Large Language Models via Semantic Compression (2023) \n
- The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey (2024) \n
Please give a thumbs up to this comment if you found it helpful!
\nIf you want recommendations for any Paper on Hugging Face checkout this Space
\n","updatedAt":"2024-01-22T13:58:08.517Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7094711065292358},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.07004","authors":[{"_id":"65a77c2921943858bde10da1","user":{"_id":"63e8b792ca4fc7d30de6975b","avatarUrl":"/avatars/57237f54d61d479df15209497a3f531e.svg","isPro":false,"fullname":"Yikai Zhang","user":"Arist12","type":"user"},"name":"Yikai Zhang","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:44:57.954Z","hidden":false},{"_id":"65a77c2921943858bde10da2","user":{"_id":"621e40ac944c7e36aaec2369","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/621e40ac944c7e36aaec2369/Yj-FJRWps3rvsS_B2bnKo.jpeg","isPro":false,"fullname":"Junlong Li","user":"lockon","type":"user"},"name":"Junlong Li","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:46:21.036Z","hidden":false},{"_id":"65a77c2921943858bde10da3","user":{"_id":"6144a0c4ff1146bbd84d9865","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1661715958139-6144a0c4ff1146bbd84d9865.png","isPro":false,"fullname":"Pengfei Liu","user":"Pengfei","type":"user"},"name":"Pengfei Liu","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:46:04.774Z","hidden":false}],"publishedAt":"2024-01-13T07:57:01.000Z","submittedOnDailyAt":"2024-01-17T04:35:15.258Z","title":"Extending LLMs' Context Window with 100 Samples","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Large Language Models (LLMs) are known to have limited extrapolation ability\nbeyond their pre-trained context window, constraining their application in\ndownstream tasks with lengthy inputs. Recent studies have sought to extend\nLLMs' context window by modifying rotary position embedding (RoPE), a popular\nposition encoding method adopted by well-known LLMs such as LLaMA, PaLM, and\nGPT-NeoX. However, prior works like Position Interpolation (PI) and YaRN are\nresource-intensive and lack comparative experiments to assess their\napplicability. In this work, we identify the inherent need for LLMs' attention\nentropy (i.e. the information entropy of attention scores) to maintain\nstability and introduce a novel extension to RoPE which combines adjusting\nRoPE's base frequency and scaling the attention logits to help LLMs efficiently\nadapt to a larger context window. We validate the superiority of our method in\nboth fine-tuning performance and robustness across different context window\nsizes on various context-demanding tasks. Notably, our method extends the\ncontext window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6\ntraining steps, showcasing extraordinary efficiency. Finally, we also explore\nhow data compositions and training curricula affect context window extension\nfor specific downstream tasks, suggesting fine-tuning LLMs with lengthy\nconversations as a good starting point. We release our code and SFT data at\nhttps://github.com/GAIR-NLP/Entropy-ABF.","upvotes":16,"discussionId":"65a77c2b21943858bde10e00","ai_summary":"A novel extension to rotary position embedding, adjusted by modifying the base frequency and scaling attention logits, enhances LLMs' context window efficiently and robustly across various tasks.","ai_keywords":["rotary position embedding","attention entropy","attention scores","Position Interpolation (PI)","YaRN","attention logits","LLaMA","PaLM","GPT-NeoX","fine-tuning performance","data compositions","training curricula","LLaMA-2-7B-Chat"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"621e40ac944c7e36aaec2369","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/621e40ac944c7e36aaec2369/Yj-FJRWps3rvsS_B2bnKo.jpeg","isPro":false,"fullname":"Junlong Li","user":"lockon","type":"user"},{"_id":"63e8b792ca4fc7d30de6975b","avatarUrl":"/avatars/57237f54d61d479df15209497a3f531e.svg","isPro":false,"fullname":"Yikai Zhang","user":"Arist12","type":"user"},{"_id":"6459483e3025ccf764afbf5f","avatarUrl":"/avatars/d86d7fc4652b6ae8ee61490d8f8758e1.svg","isPro":false,"fullname":"Derek Lewis","user":"delewis","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"645abf037d8ec84b7b5d8aa6","avatarUrl":"/avatars/3f68cbabcdfc96de3ef84d094ba1d60c.svg","isPro":false,"fullname":"Sidharth Baskaran","user":"sidnb13","type":"user"},{"_id":"5fcb4ec4835012afdc38cb29","avatarUrl":"/avatars/689ccd722bf64220364b9601d0bc3a7b.svg","isPro":false,"fullname":"kiran","user":"kira","type":"user"},{"_id":"63f8fde44ef4aacb65a00f42","avatarUrl":"/avatars/58de8f5f4bda51a1a63444e8b1113453.svg","isPro":false,"fullname":"Christian Gheorghe","user":"christiangheorghe","type":"user"},{"_id":"6101c620900eaa0057c2ce1d","avatarUrl":"/avatars/bd282166c120711c65b5409dc860ac58.svg","isPro":false,"fullname":"Abdel-Dayane Marcos","user":"admarcosai","type":"user"},{"_id":"63107b18e87051f3e3e0f598","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63107b18e87051f3e3e0f598/R9onir4Y0MZuq1jEWCZ2-.jpeg","isPro":false,"fullname":"Unchun Yang","user":"ucyang","type":"user"},{"_id":"651d618a18be7acf8e602c41","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/kEDoJKsGXpNDTOiU7FRMP.jpeg","isPro":false,"fullname":"Abreu Magalhães","user":"Hildeberto","type":"user"},{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","isPro":false,"fullname":"Lee Gao","user":"leegao19","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">Extending LLMs' Context Window with 100 Samples
Abstract
A novel extension to rotary position embedding, adjusted by modifying the base frequency and scaling attention logits, enhances LLMs' context window efficiently and robustly across various tasks.
Large Language Models (LLMs) are known to have limited extrapolation ability beyond their pre-trained context window, constraining their application in downstream tasks with lengthy inputs. Recent studies have sought to extend LLMs' context window by modifying rotary position embedding (RoPE), a popular position encoding method adopted by well-known LLMs such as LLaMA, PaLM, and GPT-NeoX. However, prior works like Position Interpolation (PI) and YaRN are resource-intensive and lack comparative experiments to assess their applicability. In this work, we identify the inherent need for LLMs' attention entropy (i.e. the information entropy of attention scores) to maintain stability and introduce a novel extension to RoPE which combines adjusting RoPE's base frequency and scaling the attention logits to help LLMs efficiently adapt to a larger context window. We validate the superiority of our method in both fine-tuning performance and robustness across different context window sizes on various context-demanding tasks. Notably, our method extends the context window of LLaMA-2-7B-Chat to 16,384 with only 100 samples and 6 training steps, showcasing extraordinary efficiency. Finally, we also explore how data compositions and training curricula affect context window extension for specific downstream tasks, suggesting fine-tuning LLMs with lengthy conversations as a good starting point. We release our code and SFT data at https://github.com/GAIR-NLP/Entropy-ABF.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- LLM Maybe LongLM: Self-Extend LLM Context Window Without Tuning (2024)
- Zebra: Extending Context Window with Layerwise Grouped Local-Global Attention (2023)
- E^2-LLM: Efficient and Extreme Length Extension of Large Language Models (2024)
- Extending Context Window of Large Language Models via Semantic Compression (2023)
- The What, Why, and How of Context Length Extension Techniques in Large Language Models -- A Detailed Survey (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper