https://huggingface.co/posts/macadeliccc/247190826659941

\n","updatedAt":"2024-06-24T05:06:58.542Z","author":{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","fullname":"Sugato Ray","name":"sugatoray","type":"user","isPro":true,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":36}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.4586365222930908},"editors":["sugatoray"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2401.06118","authors":[{"_id":"65aca2b799c3bd19c7e9558b","name":"Vage Egiazarian","hidden":false},{"_id":"65aca2b799c3bd19c7e9558c","user":{"_id":"623753b5eddd7763adc9346a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623753b5eddd7763adc9346a/rcpQAKZNrkn1-tMtraQBX.jpeg","isPro":false,"fullname":"Andrei Panferov","user":"BlackSamorez","type":"user"},"name":"Andrei Panferov","status":"claimed_verified","statusLastChangedAt":"2024-03-07T16:46:04.295Z","hidden":false},{"_id":"65aca2b799c3bd19c7e9558d","user":{"_id":"629cf0475a13ba8233dd18c9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1654452258405-noauth.jpeg","isPro":false,"fullname":"Denis Kuznedelev","user":"SpiridonSunRotator","type":"user"},"name":"Denis Kuznedelev","status":"claimed_verified","statusLastChangedAt":"2024-12-03T11:18:01.080Z","hidden":false},{"_id":"65aca2b799c3bd19c7e9558e","name":"Elias Frantar","hidden":false},{"_id":"65aca2b799c3bd19c7e9558f","name":"Artem Babenko","hidden":false},{"_id":"65aca2b799c3bd19c7e95590","user":{"_id":"64ef52c2718f94ae8e78a5e7","avatarUrl":"/avatars/d169f4ee62786a3eb4a3fa9d1fec52e9.svg","isPro":false,"fullname":"Alistarh","user":"d-alistarh","type":"user"},"name":"Dan Alistarh","status":"claimed_verified","statusLastChangedAt":"2024-12-09T14:51:26.864Z","hidden":false}],"publishedAt":"2024-01-11T18:54:44.000Z","title":"Extreme Compression of Large Language Models via Additive Quantization","summary":"The emergence of accurate open large language models (LLMs) has led to a race\ntowards quantization techniques for such models enabling execution on end-user\ndevices. In this paper, we revisit the problem of \"extreme\" LLM\ncompression--defined as targeting extremely low bit counts, such as 2 to 3 bits\nper parameter, from the point of view of classic methods in Multi-Codebook\nQuantization (MCQ). Our work builds on top of Additive Quantization, a classic\nalgorithm from the MCQ family, and adapts it to the quantization of language\nmodels. The resulting algorithm advances the state-of-the-art in LLM\ncompression, outperforming all recently-proposed techniques in terms of\naccuracy at a given compression budget. For instance, when compressing Llama 2\nmodels to 2 bits per parameter, our algorithm quantizes the 7B model to 6.93\nperplexity (a 1.29 improvement relative to the best prior work, and 1.81 points\nfrom FP16), the 13B model to 5.70 perplexity (a .36 improvement) and the 70B\nmodel to 3.94 perplexity (a .22 improvement) on WikiText2. We release our\nimplementation of Additive Quantization for Language Models AQLM as a baseline\nto facilitate future research in LLM quantization.","upvotes":13,"discussionId":"65aca2b799c3bd19c7e955a7","ai_summary":"Additive Quantization, adapted from Multi-Codebook Quantization, achieves state-of-the-art accuracy in extremely low-bit LLM compression.","ai_keywords":["Large Language Models (LLMs)","quantization techniques","Multi-Codebook Quantization (MCQ)","Additive Quantization","language models","LLaMA 2 models","perplexity"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"63a7422854f1d0225b075bfc","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63a7422854f1d0225b075bfc/XGYAcDPZG5ZEsNBWG6guw.jpeg","isPro":true,"fullname":"lhl","user":"leonardlin","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"64747f7e33192631bacd8831","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/64747f7e33192631bacd8831/dstkZJ4sHJSeqLesV5cOC.jpeg","isPro":false,"fullname":"Taufiq Dwi Purnomo","user":"taufiqdp","type":"user"},{"_id":"62441d1d9fdefb55a0b7d12c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png","isPro":false,"fullname":"Younes B","user":"ybelkada","type":"user"},{"_id":"623753b5eddd7763adc9346a","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/623753b5eddd7763adc9346a/rcpQAKZNrkn1-tMtraQBX.jpeg","isPro":false,"fullname":"Andrei Panferov","user":"BlackSamorez","type":"user"},{"_id":"663162bfc57e46020d0f44d5","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/UsjG9j5KbrwhFovme8yg-.jpeg","isPro":false,"fullname":"Baiwen Huang","user":"buyone","type":"user"},{"_id":"6032802e1f993496bc14d9e3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6032802e1f993496bc14d9e3/w6hr-DEQot4VVkoyRIBiy.png","isPro":false,"fullname":"Omar Sanseviero","user":"osanseviero","type":"user"},{"_id":"63783918d925e7dad20748a3","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63783918d925e7dad20748a3/SIMpPx2nIKyEF7Mx9E3hC.png","isPro":false,"fullname":"larryvrh","user":"larryvrh","type":"user"},{"_id":"65d60a961835a8e0761eaf7f","avatarUrl":"/avatars/b0f5ca219592733ea7a371561e66eccb.svg","isPro":false,"fullname":"Arthur Goffinet","user":"ArthurGoff","type":"user"},{"_id":"6546c78e2119c8bdf26ea6c4","avatarUrl":"/avatars/5ab2fd51b34535e2889c7195bc71645d.svg","isPro":false,"fullname":"Jonas Fietz","user":"cinderfella","type":"user"},{"_id":"5f43448a79c1ba4c353d0d8f","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/5f43448a79c1ba4c353d0d8f/DiSygV3dn7A_OjmGVTrHD.jpeg","isPro":true,"fullname":"Sugato Ray","user":"sugatoray","type":"user"},{"_id":"65aa960afd4261f531d4722f","avatarUrl":"/avatars/fad73d2bcd5275b36559a9a40cca8f36.svg","isPro":false,"fullname":"Alex","user":"osdos204","type":"user"}],"acceptLanguages":["*"]}">

arxiv:2401.06118

Extreme Compression of Large Language Models via Additive Quantization

Published on Jan 11, 2024

Upvote

Authors:

Andrei Panferov ,

Denis Kuznedelev ,

Dan Alistarh

Abstract

Additive Quantization, adapted from Multi-Codebook Quantization, achieves state-of-the-art accuracy in extremely low-bit LLM compression.

AI-generated summary

The emergence of accurate open large language models (LLMs) has led to a race towards quantization techniques for such models enabling execution on end-user devices. In this paper, we revisit the problem of "extreme" LLM compression--defined as targeting extremely low bit counts, such as 2 to 3 bits per parameter, from the point of view of classic methods in Multi-Codebook Quantization (MCQ). Our work builds on top of Additive Quantization, a classic algorithm from the MCQ family, and adapts it to the quantization of language models. The resulting algorithm advances the state-of-the-art in LLM compression, outperforming all recently-proposed techniques in terms of accuracy at a given compression budget. For instance, when compressing Llama 2 models to 2 bits per parameter, our algorithm quantizes the 7B model to 6.93 perplexity (a 1.29 improvement relative to the best prior work, and 1.81 points from FP16), the 13B model to 5.70 perplexity (a .36 improvement) and the 70B model to 3.94 perplexity (a .22 improvement) on WikiText2. We release our implementation of Additive Quantization for Language Models AQLM as a baseline to facilitate future research in LLM quantization.