lynx   »   [go: up one dir, main page]

Transformers documentation

EETQ

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v4.56.2).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

EETQ

EETQ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋Š” NVIDIA GPU์— ๋Œ€ํ•ด int8 ์ฑ„๋„๋ณ„(per-channel) ๊ฐ€์ค‘์น˜ ์ „์šฉ ์–‘์žํ™”(weight-only quantization)์„ ์ง€์›ํ•ฉ๋‹ˆ๋‹ค. ๊ณ ์„ฑ๋Šฅ GEMM ๋ฐ GEMV ์ปค๋„์€ FasterTransformer ๋ฐ TensorRT-LLM์—์„œ ๊ฐ€์ ธ์™”์Šต๋‹ˆ๋‹ค. ๊ต์ •(calibration) ๋ฐ์ดํ„ฐ์…‹์ด ํ•„์š” ์—†์œผ๋ฉฐ, ๋ชจ๋ธ์„ ์‚ฌ์ „์— ์–‘์žํ™”ํ•  ํ•„์š”๋„ ์—†์Šต๋‹ˆ๋‹ค. ๋˜ํ•œ, ์ฑ„๋„๋ณ„ ์–‘์žํ™”(per-channel quantization) ๋•๋ถ„์— ์ •ํ™•๋„ ์ €ํ•˜๊ฐ€ ๋ฏธ๋ฏธํ•ฉ๋‹ˆ๋‹ค.

๋ฆด๋ฆฌ์Šค ํŽ˜์ด์ง€์—์„œ eetq๋ฅผ ์„ค์น˜ํ–ˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

pip install --no-cache-dir https://github.com/NetEase-FuXi/EETQ/releases/download/v1.0.0/EETQ-1.0.0+cu121+torch2.1.2-cp310-cp310-linux_x86_64.whl

๋˜๋Š” ์†Œ์Šค ์ฝ”๋“œ https://github.com/NetEase-FuXi/EETQ ์—์„œ ์„ค์น˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. EETQ๋Š” CUDA ๊ธฐ๋Šฅ์ด 8.9 ์ดํ•˜์ด๊ณ  7.0 ์ด์ƒ์ด์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

git clone https://github.com/NetEase-FuXi/EETQ.git
cd EETQ/
git submodule update --init --recursive
pip install .

๋น„์–‘์žํ™” ๋ชจ๋ธ์€ โ€œfrom_pretrainedโ€๋ฅผ ํ†ตํ•ด ์–‘์žํ™”ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

from transformers import AutoModelForCausalLM, EetqConfig
path = "/path/to/model".
quantization_config = EetqConfig("int8")
model = AutoModelForCausalLM.from_pretrained(path, device_map="auto", quantization_config=quantization_config)

์–‘์žํ™”๋œ ๋ชจ๋ธ์€ โ€œsave_pretrainedโ€๋ฅผ ํ†ตํ•ด ์ €์žฅํ•  ์ˆ˜ ์žˆ์œผ๋ฉฐ, โ€œfrom_pretrainedโ€๋ฅผ ํ†ตํ•ด ๋‹ค์‹œ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

quant_path = "/path/to/save/quantized/model"
model.save_pretrained(quant_path)
model = AutoModelForCausalLM.from_pretrained(quant_path, device_map="auto")
< > Update on GitHub

ะ›ัƒั‡ัˆะธะน ั‡ะฐัั‚ะฝั‹ะน ั…ะพัั‚ะธะฝะณ