@librarian-bot\n\t recommend\n","updatedAt":"2024-07-30T16:09:39.172Z","author":{"_id":"646b8e6f31968a60a0201a12","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg","fullname":")))?!?(((","name":"stereoplegic","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3755}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7918877601623535},"editors":["stereoplegic"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/646b8e6f31968a60a0201a12/SU2Gs1NPuk1zoXHwFHl0U.jpeg"],"reactions":[],"isReport":false},"replies":[{"id":"66a9104e49ffa9ea4e4f3c4b","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264},"createdAt":"2024-07-30T16:09:50.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"This is an automated message from the [Librarian Bot](https://huggingface.co/librarian-bots). I found the following papers similar to this paper. \n\nThe following papers were recommended by the Semantic Scholar API \n\n* [QuickLLaMA: Query-aware Inference Acceleration for Large Language Models](https://huggingface.co/papers/2406.07528) (2024)\n* [LoCoCo: Dropping In Convolutions for Long Context Compression](https://huggingface.co/papers/2406.05317) (2024)\n* [CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling](https://huggingface.co/papers/2406.12018) (2024)\n* [Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent \"Middle\" Enhancement](https://huggingface.co/papers/2406.07138) (2024)\n* [Farewell to Length Extrapolation, a Training-Free Infinite Context with Finite Attention Scope](https://huggingface.co/papers/2407.15176) (2024)\n\n\n Please give a thumbs up to this comment if you found it helpful!\n\n If you want recommendations for any Paper on Hugging Face checkout [this](https://huggingface.co/spaces/librarian-bots/recommend_similar_papers) Space\n\n You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: `@librarian-bot recommend`","html":"
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
\n
The following papers were recommended by the Semantic Scholar API
Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-07-30T16:09:50.287Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7496248483657837},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false,"parentCommentId":"66a9104358cff488d9312fcc"}}]}],"primaryEmailConfirmed":false,"paper":{"id":"2402.10685","authors":[{"_id":"65d7f43d5c693011d02501ed","name":"Yi Lu","hidden":false},{"_id":"65d7f43d5c693011d02501ee","name":"Xin Zhou","hidden":false},{"_id":"65d7f43d5c693011d02501ef","user":{"_id":"66ecee857264238429a1211f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66ecee857264238429a1211f/TbuM7ToLBrSxDF8mOccpK.jpeg","isPro":false,"fullname":"Wei He","user":"hewei2001","type":"user"},"name":"Wei He","status":"claimed_verified","statusLastChangedAt":"2024-10-25T09:32:19.064Z","hidden":false},{"_id":"65d7f43d5c693011d02501f0","name":"Jun Zhao","hidden":false},{"_id":"65d7f43d5c693011d02501f1","user":{"_id":"647f5f845e1bc4753745ffaa","avatarUrl":"/avatars/86a09927413702103e04f4f4a209efe9.svg","isPro":false,"fullname":"Tao Ji","user":"TaoJi","type":"user"},"name":"Tao Ji","status":"claimed_verified","statusLastChangedAt":"2025-03-02T20:20:09.563Z","hidden":false},{"_id":"65d7f43d5c693011d02501f2","name":"Tao Gui","hidden":false},{"_id":"65d7f43d5c693011d02501f3","name":"Qi Zhang","hidden":false},{"_id":"65d7f43d5c693011d02501f4","name":"Xuanjing Huang","hidden":false}],"publishedAt":"2024-02-16T13:39:34.000Z","title":"LongHeads: Multi-Head Attention is Secretly a Long Context Processor","summary":"Large language models (LLMs) have achieved impressive performance in numerous\ndomains but often struggle to process lengthy inputs effectively and\nefficiently due to limited length generalization and attention's quadratic\ncomputational demands. Many sought to mitigate this by restricting the\nattention window within the pre-trained length. However, these methods\nintroduce new issues such as ignoring the middle context and requiring\nadditional training. To address these problems, we propose LongHeads, a\ntraining-free framework that enhances LLM's long context ability by unlocking\nmulti-head attention's untapped potential. Instead of allowing each head to\nattend to the full sentence, which struggles with generalizing to longer\nsequences due to out-of-distribution (OOD) issues, we allow each head to\nprocess in-distribution length by selecting and attending to important context\nchunks. To this end, we propose a chunk selection strategy that relies on the\ninherent correlation between the query and the key representations, efficiently\ndistributing context chunks to different heads. In this way, each head ensures\nit can effectively process attended tokens within the trained length, while\ndifferent heads in different layers can collectively process longer contexts.\nLongHeads works efficiently in linear time, fits seamlessly with many LLMs that\nuse relative positional encoding. Our extensive empirical analyses verify\nLongHeads's efficacy in extending the usable context window for existing\nmodels, showcasing its promise for enhancing long text understanding.","upvotes":1,"discussionId":"65d7f43e5c693011d0250204","ai_summary":"LongHeads enhances long context understanding in large language models by optimizing attention mechanisms to process longer sequences efficiently without additional training.","ai_keywords":["large language models","LLMS","attention's quadratic computational demands","attention window","multi-head attention","out-of-distribution (OOD) issues","chunk selection strategy","query and key representations","relative positional encoding","context chunks","long text understanding"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","isPro":false,"fullname":"Lee Gao","user":"leegao19","type":"user"}],"acceptLanguages":["*"]}">
LongHeads enhances long context understanding in large language models by optimizing attention mechanisms to process longer sequences efficiently without additional training.
AI-generated summary
Large language models (LLMs) have achieved impressive performance in numerous
domains but often struggle to process lengthy inputs effectively and
efficiently due to limited length generalization and attention's quadratic
computational demands. Many sought to mitigate this by restricting the
attention window within the pre-trained length. However, these methods
introduce new issues such as ignoring the middle context and requiring
additional training. To address these problems, we propose LongHeads, a
training-free framework that enhances LLM's long context ability by unlocking
multi-head attention's untapped potential. Instead of allowing each head to
attend to the full sentence, which struggles with generalizing to longer
sequences due to out-of-distribution (OOD) issues, we allow each head to
process in-distribution length by selecting and attending to important context
chunks. To this end, we propose a chunk selection strategy that relies on the
inherent correlation between the query and the key representations, efficiently
distributing context chunks to different heads. In this way, each head ensures
it can effectively process attended tokens within the trained length, while
different heads in different layers can collectively process longer contexts.
LongHeads works efficiently in linear time, fits seamlessly with many LLMs that
use relative positional encoding. Our extensive empirical analyses verify
LongHeads's efficacy in extending the usable context window for existing
models, showcasing its promise for enhancing long text understanding.