Please give a thumbs up to this comment if you found it helpful!
\n
If you want recommendations for any Paper on Hugging Face checkout this Space
\n
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: \n\n@librarian-bot\n\t recommend
\n","updatedAt":"2024-04-13T01:14:56.376Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6996244192123413},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2404.07979","authors":[{"_id":"661896d0de161bdcb0f59be4","user":{"_id":"652867b09903f7a1c9f7cbf1","avatarUrl":"/avatars/1d1d3665328943ccdbd6a25ce03f2f10.svg","isPro":false,"fullname":"Sijun Tan","user":"sijuntan","type":"user"},"name":"Sijun Tan","status":"admin_assigned","statusLastChangedAt":"2024-04-18T12:30:36.653Z","hidden":false},{"_id":"661896d0de161bdcb0f59be5","user":{"_id":"644570ba2d91b15b4c7f6311","avatarUrl":"/avatars/d5e66012066d0c330b8f23718b1499d8.svg","isPro":false,"fullname":"Xiuyu Li","user":"xiuyul","type":"user"},"name":"Xiuyu Li","status":"claimed_verified","statusLastChangedAt":"2024-04-12T03:05:21.524Z","hidden":false},{"_id":"661896d0de161bdcb0f59be6","user":{"_id":"6471b9f6094820190c324eec","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6471b9f6094820190c324eec/yb62BiXzRFY1_pS0GAMGq.jpeg","isPro":false,"fullname":"Shishir Patil","user":"shishirpatil","type":"user"},"name":"Shishir Patil","status":"admin_assigned","statusLastChangedAt":"2024-04-18T12:30:43.149Z","hidden":false},{"_id":"661896d0de161bdcb0f59be7","user":{"_id":"648fbaff4852c28c4cff7a3b","avatarUrl":"/avatars/f987bb904580865676c8541839dd7615.svg","isPro":false,"fullname":"wuziyang","user":"wzyohyes","type":"user"},"name":"Ziyang Wu","status":"admin_assigned","statusLastChangedAt":"2024-04-18T12:31:39.967Z","hidden":false},{"_id":"661896d0de161bdcb0f59be8","user":{"_id":"6374cd6b6ea8da14f8fef8dc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6374cd6b6ea8da14f8fef8dc/l13bg0tKDjCnUw3I895QZ.png","isPro":false,"fullname":"Tianjun Zhang","user":"tianjunz","type":"user"},"name":"Tianjun Zhang","status":"admin_assigned","statusLastChangedAt":"2024-04-18T12:31:47.275Z","hidden":false},{"_id":"661896d0de161bdcb0f59be9","user":{"_id":"6251bf4b183aa4266924ad91","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1678041834400-6251bf4b183aa4266924ad91.jpeg","isPro":true,"fullname":"Kurt Keutzer","user":"kurtkeutzer","type":"user"},"name":"Kurt Keutzer","status":"admin_assigned","statusLastChangedAt":"2024-04-18T12:32:02.210Z","hidden":false},{"_id":"661896d0de161bdcb0f59bea","user":{"_id":"645d2e8401f4eaab2a0878ce","avatarUrl":"/avatars/1273c5fb607b4b622a746a42692fa632.svg","isPro":false,"fullname":"Joseph E. Gonzalez","user":"ProfJoeyG","type":"user"},"name":"Joseph E. Gonzalez","status":"admin_assigned","statusLastChangedAt":"2024-04-18T12:32:09.861Z","hidden":false},{"_id":"661896d0de161bdcb0f59beb","name":"Raluca Ada Popa","hidden":false}],"publishedAt":"2024-04-11T17:57:22.000Z","submittedOnDailyAt":"2024-04-12T00:35:04.739Z","title":"LLoCO: Learning Long Contexts Offline","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Processing long contexts remains a challenge for large language models (LLMs)\ndue to the quadratic computational and memory overhead of the self-attention\nmechanism and the substantial KV cache sizes during generation. We propose a\nnovel approach to address this problem by learning contexts offline through\ncontext compression and in-domain parameter-efficient finetuning. Our method\nenables an LLM to create a concise representation of the original context and\nefficiently retrieve relevant information to answer questions accurately. We\nintroduce LLoCO, a technique that combines context compression, retrieval, and\nparameter-efficient finetuning using LoRA. Our approach extends the effective\ncontext window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We\nevaluate our approach on several long-context question-answering datasets,\ndemonstrating that LLoCO significantly outperforms in-context learning while\nusing 30times fewer tokens during inference. LLoCO achieves up to\n7.62times speed-up and substantially reduces the cost of long document\nquestion answering, making it a promising solution for efficient long context\nprocessing. Our code is publicly available at\nhttps://github.com/jeffreysijuntan/lloco.","upvotes":22,"discussionId":"661896d0de161bdcb0f59c06","ai_summary":"LLoCO method enhances large language models' ability to handle long contexts efficiently by combining context compression, retrieval, and parameter-efficient finetuning.","ai_keywords":["self-attention","context compression","parameter-efficient finetuning","LoRA","LLaMA2-7B","long-context question-answering","in-context learning"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"6101c620900eaa0057c2ce1d","avatarUrl":"/avatars/bd282166c120711c65b5409dc860ac58.svg","isPro":false,"fullname":"Abdel-Dayane Marcos","user":"admarcosai","type":"user"},{"_id":"6457885a75f8f7d26aa5bc44","avatarUrl":"/avatars/8ce57c4d60a1f1b5afa2c592207a8335.svg","isPro":false,"fullname":"allthingsdisaggregated","user":"lastweek","type":"user"},{"_id":"654037be97949fd2304aab7f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/654037be97949fd2304aab7f/2cSME81gcwYa2OTeVlq5Q.jpeg","isPro":false,"fullname":"Michael Luo","user":"michaelzhiluo","type":"user"},{"_id":"63814d392dd1f3e7bf59862f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63814d392dd1f3e7bf59862f/XCyKK3NvV-DbIEjR4tUue.jpeg","isPro":false,"fullname":"Charlie Cheng-Jie Ji","user":"CharlieJi","type":"user"},{"_id":"640d68938aee167ccda391da","avatarUrl":"/avatars/8e3fbf6ca10fe4e9b04a3a84d4e3e255.svg","isPro":false,"fullname":"Yunhao Fang","user":"Seerkfang","type":"user"},{"_id":"63a7422854f1d0225b075bfc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a7422854f1d0225b075bfc/XGYAcDPZG5ZEsNBWG6guw.jpeg","isPro":true,"fullname":"lhl","user":"leonardlin","type":"user"},{"_id":"637f0eb22438d7485b8ef5d7","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/637f0eb22438d7485b8ef5d7/70h7dekqj7LuBobOXckmJ.jpeg","isPro":false,"fullname":"Ming Li","user":"limingcv","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"644570ba2d91b15b4c7f6311","avatarUrl":"/avatars/d5e66012066d0c330b8f23718b1499d8.svg","isPro":false,"fullname":"Xiuyu Li","user":"xiuyul","type":"user"},{"_id":"655ac762cb17ec19ef82719b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/655ac762cb17ec19ef82719b/1kDncYrGLYS_2SR8cNdAL.png","isPro":false,"fullname":"Welcome to matlok","user":"matlok","type":"user"},{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"64096ef79e9f790c905b846d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64096ef79e9f790c905b846d/hVzw656lXdzbCxTtnheud.jpeg","isPro":false,"fullname":"Zecheng Tang","user":"ZetangForward","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
LLoCO method enhances large language models' ability to handle long contexts efficiently by combining context compression, retrieval, and parameter-efficient finetuning.
AI-generated summary
Processing long contexts remains a challenge for large language models (LLMs)
due to the quadratic computational and memory overhead of the self-attention
mechanism and the substantial KV cache sizes during generation. We propose a
novel approach to address this problem by learning contexts offline through
context compression and in-domain parameter-efficient finetuning. Our method
enables an LLM to create a concise representation of the original context and
efficiently retrieve relevant information to answer questions accurately. We
introduce LLoCO, a technique that combines context compression, retrieval, and
parameter-efficient finetuning using LoRA. Our approach extends the effective
context window of a 4k token LLaMA2-7B model to handle up to 128k tokens. We
evaluate our approach on several long-context question-answering datasets,
demonstrating that LLoCO significantly outperforms in-context learning while
using 30times fewer tokens during inference. LLoCO achieves up to
7.62times speed-up and substantially reduces the cost of long document
question answering, making it a promising solution for efficient long context
processing. Our code is publicly available at
https://github.com/jeffreysijuntan/lloco.
Generate the answers to all the questions that you want to ask
Finetune it
Voila. You can now ask it questions...
It's a bit cumbersome, and for the use-case proscribed, it defeats it's own purpose (You have have to generate the QA pairs! In the real world, these don't exist yet, hence the reason for doing the QA in the first place)
I'm sure what you've built works great in certain circumstances (books like the Bible) but for real world on the fly use cases (newly released books, legal texts, confidential data etc) this is cracking a nut with a sledgehammer, only to find you already had a pocketful of cracked nuts.