lynx   »   [go: up one dir, main page]

Librarian Bot. I found the following papers similar to this paper.

\n

The following papers were recommended by the Semantic Scholar API

\n\n

Please give a thumbs up to this comment if you found it helpful!

\n

If you want recommendations for any Paper on Hugging Face checkout this Space

\n","updatedAt":"2024-01-22T13:58:12.111Z","author":{"_id":"63d3e0e8ff1384ce6c5dd17d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg","fullname":"Librarian Bot (Bot)","name":"librarian-bot","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":264}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.7087781429290771},"editors":["librarian-bot"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1674830754237-63d3e0e8ff1384ce6c5dd17d.jpeg"],"reactions":[],"isReport":false}},{"id":"65b7d469b456c2a997e98490","author":{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","fullname":"Lee Gao","name":"leegao19","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12},"createdAt":"2024-01-29T16:38:01.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"Hi folks, after performing the sampled scaled + shifted fine-tuning, do you see the resulting model improve extrapolation at long sequences (> previously trained context window) without scaling up g (for free)?\n\nA common suspicion many people have is that the self-attention overfits the (admittedly very sparse) integer relative positions (e.g. 0 .. 2048 or 0 .. 4096) and coupled with some approximation-theoretic failures. This could be why extrapolation fails so catastrophically - the attention doesn't learn the necessary representations to use the rotary encoding (e.g. the rotational invariance) and overfits an approximation (maybe a polynomial) that fails catastrophically at the training boundary.\n\nThe scheme presented in $E^2$-LLM seems to resolve the sparsity issue, and if the suspicion is correct, you should also see a corresponding improvement in extrapolations without Positional Interpolation during inference (as long as self-attention finds a way to learn the proper representation for the rotary encoding)","html":"

Hi folks, after performing the sampled scaled + shifted fine-tuning, do you see the resulting model improve extrapolation at long sequences (> previously trained context window) without scaling up g (for free)?

\n

A common suspicion many people have is that the self-attention overfits the (admittedly very sparse) integer relative positions (e.g. 0 .. 2048 or 0 .. 4096) and coupled with some approximation-theoretic failures. This could be why extrapolation fails so catastrophically - the attention doesn't learn the necessary representations to use the rotary encoding (e.g. the rotational invariance) and overfits an approximation (maybe a polynomial) that fails catastrophically at the training boundary.

\n

The scheme presented in $E^2$-LLM seems to resolve the sparsity issue, and if the suspicion is correct, you should also see a corresponding improvement in extrapolations without Positional Interpolation during inference (as long as self-attention finds a way to learn the proper representation for the rotary encoding)

\n","updatedAt":"2024-01-29T16:38:01.651Z","author":{"_id":"61f4d468587c793cdf55b4dd","avatarUrl":"/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg","fullname":"Lee Gao","name":"leegao19","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":12}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9248782396316528},"editors":["leegao19"],"editorAvatarUrls":["/avatars/ce597d8d2640c726473dd85ae8c5cdc7.svg"],"reactions":[],"isReport":false}},{"id":"6664db075760c06469be9e65","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142},"createdAt":"2024-06-08T22:28:23.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"# Extending AI's Memory: E2-LLM Breakthrough in Large Language Models \n\nhttps://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/NY6B-NYay7xgWBR4HeWn3.mp4 \n\n## Links 🔗:\n👉 Subscribe: https://www.youtube.com/@Arxflix\n👉 Twitter: https://x.com/arxflix\n👉 LMNT (Partner): https://lmnt.com/\n\n\nBy Arxflix\n![9t4iCUHx_400x400-1.jpg](https://cdn-uploads.huggingface.co/production/uploads/6186ddf6a7717cb375090c01/v4S5zBurs0ouGNwYj1GEd.jpeg)","html":"

Extending AI's Memory: E2-LLM Breakthrough in Large Language Models

\n

\n\n

Links 🔗:

\n

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

\n

By Arxflix
\"9t4iCUHx_400x400-1.jpg\"

\n","updatedAt":"2024-06-08T22:28:23.584Z","author":{"_id":"6186ddf6a7717cb375090c01","avatarUrl":"/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg","fullname":"Julien BLANCHON","name":"blanchon","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":142}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.561782717704773},"editors":["blanchon"],"editorAvatarUrls":["/avatars/716b6a7d1094c8036b2a8a7b9063e8aa.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.06951","authors":[{"_id":"65a77b83e94d40886a883bd3","user":{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},"name":"Jiaheng Liu","status":"claimed_verified","statusLastChangedAt":"2024-01-17T13:44:48.548Z","hidden":false},{"_id":"65a77b83e94d40886a883bd4","user":{"_id":"650d40a8747f5328a2253dbc","avatarUrl":"/avatars/5d9186658fb8ab853dff6d2d75d03efb.svg","isPro":false,"fullname":"Zhiqi Bai","user":"bzq","type":"user"},"name":"Zhiqi Bai","status":"claimed_verified","statusLastChangedAt":"2024-01-18T08:20:41.032Z","hidden":false},{"_id":"65a77b83e94d40886a883bd5","name":"Yuanxing Zhang","hidden":false},{"_id":"65a77b83e94d40886a883bd6","user":{"_id":"64b74b906ab5d14ca7f289cd","avatarUrl":"/avatars/b131b7c4ce5216708ca4a678f35ead0a.svg","isPro":false,"fullname":"xxzcc","user":"xxzcc","type":"user"},"name":"Chenchen Zhang","status":"claimed_verified","statusLastChangedAt":"2024-06-02T18:48:31.688Z","hidden":false},{"_id":"65a77b83e94d40886a883bd7","name":"Yu Zhang","hidden":false},{"_id":"65a77b83e94d40886a883bd8","user":{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"},"name":"Ge Zhang","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:57:41.725Z","hidden":false},{"_id":"65a77b83e94d40886a883bd9","name":"Jiakai Wang","hidden":false},{"_id":"65a77b83e94d40886a883bda","name":"Haoran Que","hidden":false},{"_id":"65a77b83e94d40886a883bdb","user":{"_id":"62919485a29097b211bc7b83","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1653710384819-62919485a29097b211bc7b83.png","isPro":false,"fullname":"YukangChen","user":"Yukang","type":"user"},"name":"Yukang Chen","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:58:28.966Z","hidden":false},{"_id":"65a77b83e94d40886a883bdc","name":"Wenbo Su","hidden":false},{"_id":"65a77b83e94d40886a883bdd","name":"Tiezheng Ge","hidden":false},{"_id":"65a77b83e94d40886a883bde","user":{"_id":"641a6895fb5ffff5ac79d593","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641a6895fb5ffff5ac79d593/vxvwsto3llOEWGqQKGMYx.jpeg","isPro":false,"fullname":"Jie Fu","user":"bigaidream","type":"user"},"name":"Jie Fu","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:59:26.227Z","hidden":false},{"_id":"65a77b83e94d40886a883bdf","user":{"_id":"6313a86154e6e5d9f0f94e04","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662232951344-6313a86154e6e5d9f0f94e04.jpeg","isPro":false,"fullname":"Wenhu Chen","user":"wenhu","type":"user"},"name":"Wenhu Chen","status":"admin_assigned","statusLastChangedAt":"2024-01-17T09:59:32.941Z","hidden":false},{"_id":"65a77b83e94d40886a883be0","name":"Bo Zheng","hidden":false}],"publishedAt":"2024-01-13T02:11:20.000Z","submittedOnDailyAt":"2024-01-17T04:32:28.203Z","title":"E^2-LLM: Efficient and Extreme Length Extension of Large Language Models","submittedOnDailyBy":{"_id":"60f1abe7544c2adfd699860c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1674929746905-60f1abe7544c2adfd699860c.jpeg","isPro":false,"fullname":"AK","user":"akhaliq","type":"user"},"summary":"Typically, training LLMs with long context sizes is computationally\nexpensive, requiring extensive training hours and GPU resources. Existing\nlong-context extension methods usually need additional training procedures to\nsupport corresponding long-context windows, where the long-context training\ndata (e.g., 32k) is needed, and high GPU training costs are assumed. To address\nthe aforementioned issues, we propose an Efficient and Extreme length extension\nmethod for Large Language Models, called E 2 -LLM, with only one training\nprocedure and dramatically reduced computation cost, which also removes the\nneed to collect long-context data. Concretely, first, the training data of our\nE 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost\ngreatly. Second, the training procedure on the short training context window is\nperformed only once time, and we can support different evaluation context\nwindows at inference. Third, in E 2 - LLM, based on RoPE position embeddings,\nwe introduce two different augmentation methods on the scale and position index\nparameters for different samples in training. It aims to make the model more\nrobust to the different relative differences when directly interpolating the\narbitrary context length at inference. Comprehensive experimental results on\nmultiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on\nchallenging long-context tasks.","upvotes":26,"discussionId":"65a77b84e94d40886a883c16","ai_summary":"E 2 -LLM extends long-context capabilities in LLMs with reduced computation and data requirements through single training, short context windows, and novel position embedding augmentations.","ai_keywords":["LLMs","long context","GPU resources","long-context extension methods","Efficient and Extreme length extension","E 2 -LLM","short length","RoPE position embeddings","augmentation methods","scale and position index parameters","model robustness","relative differences","interpolation","benchmark datasets"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"65377c30e48353201e6fdda0","avatarUrl":"/avatars/a8f803b6f2e598eaee9c52c0d2ddfc16.svg","isPro":false,"fullname":"Jiaheng Liu","user":"CheeryLJH","type":"user"},{"_id":"638efcf4c67af472d316d424","avatarUrl":"/avatars/97a57859d7d87a3a8f1bb41d32a72bc2.svg","isPro":false,"fullname":"Ge Zhang","user":"zhangysk","type":"user"},{"_id":"641a6895fb5ffff5ac79d593","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641a6895fb5ffff5ac79d593/vxvwsto3llOEWGqQKGMYx.jpeg","isPro":false,"fullname":"Jie Fu","user":"bigaidream","type":"user"},{"_id":"5fd6f670053c8345eddc1b68","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5fd6f670053c8345eddc1b68/cuTsu2krRYHC6zYGD2dpQ.jpeg","isPro":false,"fullname":"Ruibin Yuan","user":"a43992899","type":"user"},{"_id":"6410665d5364a661bee22524","avatarUrl":"/avatars/f1cb0e07f36933187ceccbd5dcbeff79.svg","isPro":false,"fullname":"Yinghao Ma","user":"nicolaus625","type":"user"},{"_id":"62579c55b98dcaa7e0de285d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62579c55b98dcaa7e0de285d/0YUd5nloul_bW9yolDGGo.jpeg","isPro":false,"fullname":"wangjunjie","user":"wanng","type":"user"},{"_id":"6149a9e95347647e6bb68882","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6149a9e95347647e6bb68882/Jddln1FxScCeVgTSCNBpr.png","isPro":false,"fullname":"Zekun Moore Wang","user":"ZenMoore","type":"user"},{"_id":"639be86b59473c6ae02ef9c4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/639be86b59473c6ae02ef9c4/gw34RBCVZCOkcAA79xUr3.png","isPro":true,"fullname":"Jie Liu","user":"jieliu","type":"user"},{"_id":"6313a86154e6e5d9f0f94e04","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1662232951344-6313a86154e6e5d9f0f94e04.jpeg","isPro":false,"fullname":"Wenhu Chen","user":"wenhu","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"5fcb4ec4835012afdc38cb29","avatarUrl":"/avatars/689ccd722bf64220364b9601d0bc3a7b.svg","isPro":false,"fullname":"kiran","user":"kira","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Papers
arxiv:2401.06951

E^2-LLM: Efficient and Extreme Length Extension of Large Language Models

Published on Jan 13, 2024
· Submitted by AK on Jan 17, 2024
Authors:
,
,
,
,
,
,
Jie Fu ,

Abstract

E 2 -LLM extends long-context capabilities in LLMs with reduced computation and data requirements through single training, short context windows, and novel position embedding augmentations.

AI-generated summary

Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. Existing long-context extension methods usually need additional training procedures to support corresponding long-context windows, where the long-context training data (e.g., 32k) is needed, and high GPU training costs are assumed. To address the aforementioned issues, we propose an Efficient and Extreme length extension method for Large Language Models, called E 2 -LLM, with only one training procedure and dramatically reduced computation cost, which also removes the need to collect long-context data. Concretely, first, the training data of our E 2 -LLM only requires a short length (e.g., 4k), which reduces the tuning cost greatly. Second, the training procedure on the short training context window is performed only once time, and we can support different evaluation context windows at inference. Third, in E 2 - LLM, based on RoPE position embeddings, we introduce two different augmentation methods on the scale and position index parameters for different samples in training. It aims to make the model more robust to the different relative differences when directly interpolating the arbitrary context length at inference. Comprehensive experimental results on multiple benchmark datasets demonstrate the effectiveness of our E 2 -LLM on challenging long-context tasks.

Community

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

Hi folks, after performing the sampled scaled + shifted fine-tuning, do you see the resulting model improve extrapolation at long sequences (> previously trained context window) without scaling up g (for free)?

A common suspicion many people have is that the self-attention overfits the (admittedly very sparse) integer relative positions (e.g. 0 .. 2048 or 0 .. 4096) and coupled with some approximation-theoretic failures. This could be why extrapolation fails so catastrophically - the attention doesn't learn the necessary representations to use the rotary encoding (e.g. the rotational invariance) and overfits an approximation (maybe a polynomial) that fails catastrophically at the training boundary.

The scheme presented in $E^2$-LLM seems to resolve the sparsity issue, and if the suspicion is correct, you should also see a corresponding improvement in extrapolations without Positional Interpolation during inference (as long as self-attention finds a way to learn the proper representation for the rotary encoding)

Extending AI's Memory: E2-LLM Breakthrough in Large Language Models

Links 🔗:

👉 Subscribe: https://www.youtube.com/@Arxflix
👉 Twitter: https://x.com/arxflix
👉 LMNT (Partner): https://lmnt.com/

By Arxflix
9t4iCUHx_400x400-1.jpg

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2401.06951 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2401.06951 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2401.06951 in a Space README.md to link it from this page.

Collections including this paper 14

Лучший частный хостинг