https://huggingface.co/posts/macadeliccc/247190826659941\n","updatedAt":"2024-06-24T05:06:58.542Z","author":{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","fullname":"Sugato Ray","name":"sugatoray","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":36}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4586365222930908},"editors":["sugatoray"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.06118","authors":[{"_id":"65aca2b799c3bd19c7e9558b","name":"Vage Egiazarian","hidden":false},{"_id":"65aca2b799c3bd19c7e9558c","user":{"_id":"623753b5eddd7763adc9346a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623753b5eddd7763adc9346a/rcpQAKZNrkn1-tMtraQBX.jpeg","isPro":false,"fullname":"Andrei Panferov","user":"BlackSamorez","type":"user"},"name":"Andrei Panferov","status":"claimed_verified","statusLastChangedAt":"2024-03-07T16:46:04.295Z","hidden":false},{"_id":"65aca2b799c3bd19c7e9558d","user":{"_id":"629cf0475a13ba8233dd18c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654452258405-noauth.jpeg","isPro":false,"fullname":"Denis Kuznedelev","user":"SpiridonSunRotator","type":"user"},"name":"Denis Kuznedelev","status":"claimed_verified","statusLastChangedAt":"2024-12-03T11:18:01.080Z","hidden":false},{"_id":"65aca2b799c3bd19c7e9558e","name":"Elias Frantar","hidden":false},{"_id":"65aca2b799c3bd19c7e9558f","name":"Artem Babenko","hidden":false},{"_id":"65aca2b799c3bd19c7e95590","user":{"_id":"64ef52c2718f94ae8e78a5e7","avatarUrl":"/avatars/d169f4ee62786a3eb4a3fa9d1fec52e9.svg","isPro":false,"fullname":"Alistarh","user":"d-alistarh","type":"user"},"name":"Dan Alistarh","status":"claimed_verified","statusLastChangedAt":"2024-12-09T14:51:26.864Z","hidden":false}],"publishedAt":"2024-01-11T18:54:44.000Z","title":"Extreme Compression of Large Language Models via Additive Quantization","summary":"The emergence of accurate open large language models (LLMs) has led to a race\ntowards quantization techniques for such models enabling execution on end-user\ndevices. In this paper, we revisit the problem of \"extreme\" LLM\ncompression--defined as targeting extremely low bit counts, such as 2 to 3 bits\nper parameter, from the point of view of classic methods in Multi-Codebook\nQuantization (MCQ). Our work builds on top of Additive Quantization, a classic\nalgorithm from the MCQ family, and adapts it to the quantization of language\nmodels. The resulting algorithm advances the state-of-the-art in LLM\ncompression, outperforming all recently-proposed techniques in terms of\naccuracy at a given compression budget. For instance, when compressing Llama 2\nmodels to 2 bits per parameter, our algorithm quantizes the 7B model to 6.93\nperplexity (a 1.29 improvement relative to the best prior work, and 1.81 points\nfrom FP16), the 13B model to 5.70 perplexity (a .36 improvement) and the 70B\nmodel to 3.94 perplexity (a .22 improvement) on WikiText2. We release our\nimplementation of Additive Quantization for Language Models AQLM as a baseline\nto facilitate future research in LLM quantization.","upvotes":13,"discussionId":"65aca2b799c3bd19c7e955a7","ai_summary":"Additive Quantization, adapted from Multi-Codebook Quantization, achieves state-of-the-art accuracy in extremely low-bit LLM compression.","ai_keywords":["Large Language Models (LLMs)","quantization techniques","Multi-Codebook Quantization (MCQ)","Additive Quantization","language models","LLaMA 2 models","perplexity"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63a7422854f1d0225b075bfc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a7422854f1d0225b075bfc/XGYAcDPZG5ZEsNBWG6guw.jpeg","isPro":true,"fullname":"lhl","user":"leonardlin","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"62441d1d9fdefb55a0b7d12c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png","isPro":false,"fullname":"Younes B","user":"ybelkada","type":"user"},{"_id":"623753b5eddd7763adc9346a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623753b5eddd7763adc9346a/rcpQAKZNrkn1-tMtraQBX.jpeg","isPro":false,"fullname":"Andrei Panferov","user":"BlackSamorez","type":"user"},{"_id":"663162bfc57e46020d0f44d5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/UsjG9j5KbrwhFovme8yg-.jpeg","isPro":false,"fullname":"Baiwen Huang","user":"buyone","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"63783918d925e7dad20748a3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63783918d925e7dad20748a3/SIMpPx2nIKyEF7Mx9E3hC.png","isPro":false,"fullname":"larryvrh","user":"larryvrh","type":"user"},{"_id":"65d60a961835a8e0761eaf7f","avatarUrl":"/avatars/b0f5ca219592733ea7a371561e66eccb.svg","isPro":false,"fullname":"Arthur Goffinet","user":"ArthurGoff","type":"user"},{"_id":"6546c78e2119c8bdf26ea6c4","avatarUrl":"/avatars/5ab2fd51b34535e2889c7195bc71645d.svg","isPro":false,"fullname":"Jonas Fietz","user":"cinderfella","type":"user"},{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"65aa960afd4261f531d4722f","avatarUrl":"/avatars/fad73d2bcd5275b36559a9a40cca8f36.svg","isPro":false,"fullname":"Alex","user":"osdos204","type":"user"}],"acceptLanguages":["*"]}">
Additive Quantization, adapted from Multi-Codebook Quantization, achieves state-of-the-art accuracy in extremely low-bit LLM compression.
AI-generated summary
The emergence of accurate open large language models (LLMs) has led to a race
towards quantization techniques for such models enabling execution on end-user
devices. In this paper, we revisit the problem of "extreme" LLM
compression--defined as targeting extremely low bit counts, such as 2 to 3 bits
per parameter, from the point of view of classic methods in Multi-Codebook
Quantization (MCQ). Our work builds on top of Additive Quantization, a classic
algorithm from the MCQ family, and adapts it to the quantization of language
models. The resulting algorithm advances the state-of-the-art in LLM
compression, outperforming all recently-proposed techniques in terms of
accuracy at a given compression budget. For instance, when compressing Llama 2
models to 2 bits per parameter, our algorithm quantizes the 7B model to 6.93
perplexity (a 1.29 improvement relative to the best prior work, and 1.81 points
from FP16), the 13B model to 5.70 perplexity (a .36 improvement) and the 70B
model to 3.94 perplexity (a .22 improvement) on WikiText2. We release our
implementation of Additive Quantization for Language Models AQLM as a baseline
to facilitate future research in LLM quantization.