lynx   »   [go: up one dir, main page]

Diffusers documentation

DreamBooth

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.35.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

DreamBooth

DreamBooth๋Š” ํ•œ ์ฃผ์ œ์— ๋Œ€ํ•œ ์ ์€ ์ด๋ฏธ์ง€(3~5๊ฐœ)๋งŒ์œผ๋กœ๋„ stable diffusion๊ณผ ๊ฐ™์ด text-to-image ๋ชจ๋ธ์„ ๊ฐœ์ธํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ๋ฐฉ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ๋ชจ๋ธ์€ ๋‹ค์–‘ํ•œ ์žฅ๋ฉด, ํฌ์ฆˆ ๋ฐ ์žฅ๋ฉด(๋ทฐ)์—์„œ ํ”ผ์‚ฌ์ฒด์— ๋Œ€ํ•ด ๋งฅ๋ฝํ™”(contextualized)๋œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ”„๋กœ์ ํŠธ ๋ธ”๋กœ๊ทธ์—์„œ์˜ DreamBooth ์˜ˆ์‹œ

์—์„œ์˜ Dreambooth ์˜ˆ์‹œ project's blog.

์ด ๊ฐ€์ด๋“œ๋Š” ๋‹ค์–‘ํ•œ GPU, Flax ์‚ฌ์–‘์— ๋Œ€ํ•ด CompVis/stable-diffusion-v1-4 ๋ชจ๋ธ๋กœ DreamBooth๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค. ๋” ๊นŠ์ด ํŒŒ๊ณ ๋“ค์–ด ์ž‘๋™ ๋ฐฉ์‹์„ ํ™•์ธํ•˜๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ๋Š” ๊ฒฝ์šฐ, ์ด ๊ฐ€์ด๋“œ์— ์‚ฌ์šฉ๋œ DreamBooth์˜ ๋ชจ๋“  ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์—ฌ๊ธฐ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ์˜ ํ•™์Šต์— ํ•„์š”ํ•œ dependencies๋ฅผ ์„ค์น˜ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ๋˜ํ•œ main GitHub ๋ธŒ๋žœ์น˜์—์„œ ๐Ÿงจ Diffusers๋ฅผ ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

pip install git+https://github.com/huggingface/diffusers
pip install -U -r diffusers/examples/dreambooth/requirements.txt

xFormers๋Š” ํ•™์Šต์— ํ•„์š”ํ•œ ์š”๊ตฌ ์‚ฌํ•ญ์€ ์•„๋‹ˆ์ง€๋งŒ, ๊ฐ€๋Šฅํ•˜๋ฉด ์„ค์น˜ํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. ํ•™์Šต ์†๋„๋ฅผ ๋†’์ด๊ณ  ๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค.

๋ชจ๋“  dependencies์„ ์„ค์ •ํ•œ ํ›„ ๋‹ค์Œ์„ ์‚ฌ์šฉํ•˜์—ฌ ๐Ÿค— Accelerate ํ™˜๊ฒฝ์„ ๋‹ค์Œ๊ณผ ๊ฐ™์ด ์ดˆ๊ธฐํ™”ํ•ฉ๋‹ˆ๋‹ค:

accelerate config

๋ณ„๋„ ์„ค์ • ์—†์ด ๊ธฐ๋ณธ ๐Ÿค— Accelerate ํ™˜๊ฒฝ์„ ์„ค์น˜ํ•˜๋ ค๋ฉด ๋‹ค์Œ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค:

accelerate config default

๋˜๋Š” ํ˜„์žฌ ํ™˜๊ฒฝ์ด ๋…ธํŠธ๋ถ๊ณผ ๊ฐ™์€ ๋Œ€ํ™”ํ˜• ์…ธ์„ ์ง€์›ํ•˜์ง€ ์•Š๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

from accelerate.utils import write_basic_config

write_basic_config()

ํŒŒ์ธํŠœ๋‹

DreamBooth ํŒŒ์ธํŠœ๋‹์€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ์— ๋งค์šฐ ๋ฏผ๊ฐํ•˜๊ณ  ๊ณผ์ ํ•ฉ๋˜๊ธฐ ์‰ฝ์Šต๋‹ˆ๋‹ค. ์ ์ ˆํ•œ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์„ ํƒํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋˜๋„๋ก ๋‹ค์–‘ํ•œ ๊ถŒ์žฅ ์„ค์ •์ด ํฌํ•จ๋œ ์‹ฌ์ธต ๋ถ„์„์„ ์‚ดํŽด๋ณด๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค.

Pytorch
Hide Pytorch content

๋ช‡ ์žฅ์˜ ๊ฐ•์•„์ง€ ์ด๋ฏธ์ง€๋“ค๋กœ DreamBooth๋ฅผ ์‹œ๋„ํ•ด๋ด…์‹œ๋‹ค. ์ด๋ฅผ ๋‹ค์šด๋กœ๋“œํ•ด ๋””๋ ‰ํ„ฐ๋ฆฌ์— ์ €์žฅํ•œ ๋‹ค์Œ INSTANCE_DIR ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋ฅผ ํ•ด๋‹น ๊ฒฝ๋กœ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export OUTPUT_DIR="path_to_saved_model"

๊ทธ๋Ÿฐ ๋‹ค์Œ, ๋‹ค์Œ ๋ช…๋ น์„ ์‚ฌ์šฉํ•˜์—ฌ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค (์ „์ฒด ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ์—ฌ๊ธฐ์—์„œ ์ฐพ์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค):

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=400
JAX
Hide JAX content

TPU์— ์•ก์„ธ์Šคํ•  ์ˆ˜ ์žˆ๊ฑฐ๋‚˜ ๋” ๋น ๋ฅด๊ฒŒ ํ›ˆ๋ จํ•˜๊ณ  ์‹ถ๋‹ค๋ฉด Flax ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. Flax ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” gradient checkpointing ๋˜๋Š” gradient accumulation์„ ์ง€์›ํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ, ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ 30GB ์ด์ƒ์ธ GPU๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰ํ•˜๊ธฐ ์ „์— ์š”๊ตฌ ์‚ฌํ•ญ์ด ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์‹ญ์‹œ์˜ค.

pip install -U -r requirements.txt

๊ทธ๋Ÿฌ๋ฉด ๋‹ค์Œ ๋ช…๋ น์–ด๋กœ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‹คํ–‰์‹œํ‚ฌ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export OUTPUT_DIR="path-to-save-model"

python train_dreambooth_flax.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --output_dir=$OUTPUT_DIR \
  --instance_prompt="a photo of sks dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --learning_rate=5e-6 \
  --max_train_steps=400

Prior-preserving(์‚ฌ์ „ ๋ณด์กด) loss๋ฅผ ์‚ฌ์šฉํ•œ ํŒŒ์ธํŠœ๋‹

๊ณผ์ ํ•ฉ๊ณผ language drift๋ฅผ ๋ฐฉ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ์‚ฌ์ „ ๋ณด์กด์ด ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค(๊ด€์‹ฌ์ด ์žˆ๋Š” ๊ฒฝ์šฐ ๋…ผ๋ฌธ์„ ์ฐธ์กฐํ•˜์„ธ์š”). ์‚ฌ์ „ ๋ณด์กด์„ ์œ„ํ•ด ๋™์ผํ•œ ํด๋ž˜์Šค์˜ ๋‹ค๋ฅธ ์ด๋ฏธ์ง€๋ฅผ ํ•™์Šต ํ”„๋กœ์„ธ์Šค์˜ ์ผ๋ถ€๋กœ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ข‹์€ ์ ์€ Stable Diffusion ๋ชจ๋ธ ์ž์ฒด๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ด๋Ÿฌํ•œ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค! ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ๋Š” ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ์šฐ๋ฆฌ๊ฐ€ ์ง€์ •ํ•œ ๋กœ์ปฌ ๊ฒฝ๋กœ์— ์ €์žฅํ•ฉ๋‹ˆ๋‹ค.

์ €์ž๋“ค์— ๋”ฐ๋ฅด๋ฉด ์‚ฌ์ „ ๋ณด์กด์„ ์œ„ํ•ด num_epochs * num_samples๊ฐœ์˜ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ๊ฒƒ์ด ์ข‹์Šต๋‹ˆ๋‹ค. 200-300๊ฐœ์—์„œ ๋Œ€๋ถ€๋ถ„ ์ž˜ ์ž‘๋™ํ•ฉ๋‹ˆ๋‹ค.

Pytorch
Hide Pytorch content
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800
JAX
Hide JAX content
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

python train_dreambooth_flax.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --learning_rate=5e-6 \
  --num_class_images=200 \
  --max_train_steps=800

ํ…์ŠคํŠธ ์ธ์ฝ”๋”์™€ and UNet๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜๊ธฐ

ํ•ด๋‹น ์Šคํฌ๋ฆฝํŠธ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด unet๊ณผ ํ•จ๊ป˜ text_encoder๋ฅผ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์‹คํ—˜์—์„œ(์ž์„ธํ•œ ๋‚ด์šฉ์€ ๐Ÿงจ Diffusers๋ฅผ ์‚ฌ์šฉํ•ด DreamBooth๋กœ Stable Diffusion ํ•™์Šตํ•˜๊ธฐ ๊ฒŒ์‹œ๋ฌผ์„ ํ™•์ธํ•˜์„ธ์š”), ํŠนํžˆ ์–ผ๊ตด ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•  ๋•Œ ํ›จ์”ฌ ๋” ๋‚˜์€ ๊ฒฐ๊ณผ๋ฅผ ์–ป์„ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ํ•™์Šต์‹œํ‚ค๋ ค๋ฉด ์ถ”๊ฐ€ ๋ฉ”๋ชจ๋ฆฌ๊ฐ€ ํ•„์š”ํ•ด 16GB GPU๋กœ๋Š” ๋™์ž‘ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค. ์ด ์˜ต์…˜์„ ์‚ฌ์šฉํ•˜๋ ค๋ฉด ์ตœ์†Œ 24GB VRAM์ด ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

--train_text_encoder ์ธ์ˆ˜๋ฅผ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ์ „๋‹ฌํ•˜์—ฌ text_encoder ๋ฐ unet์„ ํŒŒ์ธํŠœ๋‹ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

Pytorch
Hide Pytorch content
export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --use_8bit_adam
  --gradient_checkpointing \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800
JAX
Hide JAX content
export MODEL_NAME="duongna/stable-diffusion-v1-4-flax"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

python train_dreambooth_flax.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --train_text_encoder \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --learning_rate=2e-6 \
  --num_class_images=200 \
  --max_train_steps=800

LoRA๋กœ ํŒŒ์ธํŠœ๋‹ํ•˜๊ธฐ

DreamBooth์—์„œ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ์˜ ํ•™์Šต์„ ๊ฐ€์†ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ํŒŒ์ธํŠœ๋‹ ๊ธฐ์ˆ ์ธ LoRA(Low-Rank Adaptation of Large Language Models)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ž์„ธํ•œ ๋‚ด์šฉ์€ LoRA ํ•™์Šต ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

ํ•™์Šต ์ค‘ ์ฒดํฌํฌ์ธํŠธ ์ €์žฅํ•˜๊ธฐ

Dreambooth๋กœ ํ›ˆ๋ จํ•˜๋Š” ๋™์•ˆ ๊ณผ์ ํ•ฉํ•˜๊ธฐ ์‰ฌ์šฐ๋ฏ€๋กœ, ๋•Œ๋•Œ๋กœ ํ•™์Šต ์ค‘์— ์ •๊ธฐ์ ์ธ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์ €์žฅํ•˜๋Š” ๊ฒƒ์ด ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ตœ์ข… ๋ชจ๋ธ๋ณด๋‹ค ๋” ์ž˜ ์ž‘๋™ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค! ์ฒดํฌํฌ์ธํŠธ ์ €์žฅ ๊ธฐ๋Šฅ์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค:

  --checkpointing_steps=500

์ด๋ ‡๊ฒŒ ํ•˜๋ฉด output_dir์˜ ํ•˜์œ„ ํด๋”์— ์ „์ฒด ํ•™์Šต ์ƒํƒœ๊ฐ€ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ํ•˜์œ„ ํด๋” ์ด๋ฆ„์€ ์ ‘๋‘์‚ฌ checkpoint-๋กœ ์‹œ์ž‘ํ•˜๊ณ  ์ง€๊ธˆ๊นŒ์ง€ ์ˆ˜ํ–‰๋œ step ์ˆ˜์ž…๋‹ˆ๋‹ค. ์˜ˆ์‹œ๋กœ checkpoint-1500์€ 1500 ํ•™์Šต step ํ›„์— ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์ž…๋‹ˆ๋‹ค.

์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ํ›ˆ๋ จ ์žฌ๊ฐœํ•˜๊ธฐ

์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ ํ›ˆ๋ จ์„ ์žฌ๊ฐœํ•˜๋ ค๋ฉด, --resume_from_checkpoint ์ธ์ˆ˜๋ฅผ ์ „๋‹ฌํ•œ ๋‹ค์Œ ์‚ฌ์šฉํ•  ์ฒดํฌํฌ์ธํŠธ์˜ ์ด๋ฆ„์„ ์ง€์ •ํ•˜๋ฉด ๋ฉ๋‹ˆ๋‹ค. ํŠน์ˆ˜ ๋ฌธ์ž์—ด "latest"๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ €์žฅ๋œ ๋งˆ์ง€๋ง‰ ์ฒดํฌํฌ์ธํŠธ(์ฆ‰, step ์ˆ˜๊ฐ€ ๊ฐ€์žฅ ๋งŽ์€ ์ฒดํฌํฌ์ธํŠธ)์—์„œ ์žฌ๊ฐœํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด ๋‹ค์Œ์€ 1500 step ํ›„์— ์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ์—์„œ๋ถ€ํ„ฐ ํ•™์Šต์„ ์žฌ๊ฐœํ•ฉ๋‹ˆ๋‹ค:

  --resume_from_checkpoint="checkpoint-1500"

์›ํ•˜๋Š” ๊ฒฝ์šฐ ์ผ๋ถ€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ถ”๋ก  ์ˆ˜ํ–‰ํ•˜๊ธฐ

์ €์žฅ๋œ ์ฒดํฌํฌ์ธํŠธ๋Š” ํ›ˆ๋ จ ์žฌ๊ฐœ์— ์ ํ•ฉํ•œ ํ˜•์‹์œผ๋กœ ์ €์žฅ๋ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์—๋Š” ๋ชจ๋ธ ๊ฐ€์ค‘์น˜๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์˜ตํ‹ฐ๋งˆ์ด์ €, ๋ฐ์ดํ„ฐ ๋กœ๋” ๋ฐ ํ•™์Šต๋ฅ ์˜ ์ƒํƒœ๋„ ํฌํ•จ๋ฉ๋‹ˆ๋‹ค.

"accelerate>=0.16.0"์ด ์„ค์น˜๋œ ๊ฒฝ์šฐ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•ฉ๋‹ˆ๋‹ค.

from diffusers import DiffusionPipeline, UNet2DConditionModel
from transformers import CLIPTextModel
import torch

# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model_id = "CompVis/stable-diffusion-v1-4"

unet = UNet2DConditionModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/unet")

# `args.train_text_encoder`๋กœ ํ•™์Šตํ•œ ๊ฒฝ์šฐ๋ฉด ํ…์ŠคํŠธ ์ธ์ฝ”๋”๋ฅผ ๊ผญ ๋ถˆ๋Ÿฌ์˜ค์„ธ์š”
text_encoder = CLIPTextModel.from_pretrained("/sddata/dreambooth/daruma-v2-1/checkpoint-100/text_encoder")

pipeline = DiffusionPipeline.from_pretrained(model_id, unet=unet, text_encoder=text_encoder, dtype=torch.float16)
pipeline.to("cuda")

# ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ์ €์žฅํ•˜๊ฑฐ๋‚˜, ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค.
pipeline.save_pretrained("dreambooth-pipeline")

If you have "accelerate<0.16.0" installed, you need to convert it to an inference pipeline first:

from accelerate import Accelerator
from diffusers import DiffusionPipeline

# ํ•™์Šต์— ์‚ฌ์šฉ๋œ ๊ฒƒ๊ณผ ๋™์ผํ•œ ์ธ์ˆ˜(model, revision)๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋ถˆ๋Ÿฌ์˜ต๋‹ˆ๋‹ค.
model_id = "CompVis/stable-diffusion-v1-4"
pipeline = DiffusionPipeline.from_pretrained(model_id)

accelerator = Accelerator()

# ์ดˆ๊ธฐ ํ•™์Šต์— `--train_text_encoder`๊ฐ€ ์‚ฌ์šฉ๋œ ๊ฒฝ์šฐ text_encoder๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.
unet, text_encoder = accelerator.prepare(pipeline.unet, pipeline.text_encoder)

# ์ฒดํฌํฌ์ธํŠธ ๊ฒฝ๋กœ๋กœ๋ถ€ํ„ฐ ์ƒํƒœ๋ฅผ ๋ณต์›ํ•ฉ๋‹ˆ๋‹ค. ์—ฌ๊ธฐ์„œ๋Š” ์ ˆ๋Œ€ ๊ฒฝ๋กœ๋ฅผ ์‚ฌ์šฉํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.
accelerator.load_state("/sddata/dreambooth/daruma-v2-1/checkpoint-100")

# unwrapped ๋ชจ๋ธ๋กœ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋‹ค์‹œ ๋นŒ๋“œํ•ฉ๋‹ˆ๋‹ค.(.unet and .text_encoder๋กœ์˜ ํ• ๋‹น๋„ ์ž‘๋™ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค)
pipeline = DiffusionPipeline.from_pretrained(
    model_id,
    unet=accelerator.unwrap_model(unet),
    text_encoder=accelerator.unwrap_model(text_encoder),
)

# ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•˜๊ฑฐ๋‚˜ ์ €์žฅํ•˜๊ฑฐ๋‚˜, ํ—ˆ๋ธŒ์— ํ‘ธ์‹œํ•ฉ๋‹ˆ๋‹ค.
pipeline.save_pretrained("dreambooth-pipeline")

๊ฐ GPU ์šฉ๋Ÿ‰์—์„œ์˜ ์ตœ์ ํ™”

ํ•˜๋“œ์›จ์–ด์— ๋”ฐ๋ผ 16GB์—์„œ 8GB๊นŒ์ง€ GPU์—์„œ DreamBooth๋ฅผ ์ตœ์ ํ™”ํ•˜๋Š” ๋ช‡ ๊ฐ€์ง€ ๋ฐฉ๋ฒ•์ด ์žˆ์Šต๋‹ˆ๋‹ค!

xFormers

xFormers๋Š” Transformers๋ฅผ ์ตœ์ ํ™”ํ•˜๊ธฐ ์œ„ํ•œ toolbox์ด๋ฉฐ, ๐Ÿงจ Diffusers์—์„œ ์‚ฌ์šฉ๋˜๋Š”memory-efficient attention ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํฌํ•จํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค. xFormers๋ฅผ ์„ค์น˜ํ•œ ๋‹ค์Œ ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•ฉ๋‹ˆ๋‹ค:

  --enable_xformers_memory_efficient_attention

xFormers๋Š” Flax์—์„œ ์‚ฌ์šฉํ•  ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.

๊ทธ๋ž˜๋””์–ธํŠธ ์—†์Œ์œผ๋กœ ์„ค์ •

๋ฉ”๋ชจ๋ฆฌ ์‚ฌ์šฉ๋Ÿ‰์„ ์ค„์ผ ์ˆ˜ ์žˆ๋Š” ๋˜ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์€ ๊ธฐ์šธ๊ธฐ ์„ค์ •์„ 0 ๋Œ€์‹  None์œผ๋กœ ํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ์ด๋กœ ์ธํ•ด ํŠน์ • ๋™์ž‘์ด ๋ณ€๊ฒฝ๋  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ๋ฌธ์ œ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์ด ์ธ์ˆ˜๋ฅผ ์ œ๊ฑฐํ•ด ๋ณด์‹ญ์‹œ์˜ค. ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— ๋‹ค์Œ ์ธ์ˆ˜๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ None์œผ๋กœ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

  --set_grads_to_none

16GB GPU

Gradient checkpointing๊ณผ bitsandbytes์˜ 8๋น„ํŠธ ์˜ตํ‹ฐ๋งˆ์ด์ €์˜ ๋„์›€์œผ๋กœ, 16GB GPU์—์„œ dreambooth๋ฅผ ํ›ˆ๋ จํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. bitsandbytes๊ฐ€ ์„ค์น˜๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”:

pip install bitsandbytes

๊ทธ ๋‹ค์Œ, ํ•™์Šต ์Šคํฌ๋ฆฝํŠธ์— --use_8bit_adam ์˜ต์…˜์„ ๋ช…์‹œํ•ฉ๋‹ˆ๋‹ค:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=2 --gradient_checkpointing \
  --use_8bit_adam \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800

12GB GPU

12GB GPU์—์„œ DreamBooth๋ฅผ ์‹คํ–‰ํ•˜๋ ค๋ฉด gradient checkpointing, 8๋น„ํŠธ ์˜ตํ‹ฐ๋งˆ์ด์ €, xFormers๋ฅผ ํ™œ์„ฑํ™”ํ•˜๊ณ  ๊ทธ๋ž˜๋””์–ธํŠธ๋ฅผ None์œผ๋กœ ์„ค์ •ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path-to-instance-images"
export CLASS_DIR="path-to-class-images"
export OUTPUT_DIR="path-to-save-model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=1 --gradient_checkpointing \
  --use_8bit_adam \
  --enable_xformers_memory_efficient_attention \
  --set_grads_to_none \
  --learning_rate=2e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800

8GB GPU์—์„œ ํ•™์Šตํ•˜๊ธฐ

8GB GPU์— ๋Œ€ํ•ด์„œ๋Š” DeepSpeed๋ฅผ ์‚ฌ์šฉํ•ด ์ผ๋ถ€ ํ…์„œ๋ฅผ VRAM์—์„œ CPU ๋˜๋Š” NVME๋กœ ์˜คํ”„๋กœ๋“œํ•˜์—ฌ ๋” ์ ์€ GPU ๋ฉ”๋ชจ๋ฆฌ๋กœ ํ•™์Šตํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

๐Ÿค— Accelerate ํ™˜๊ฒฝ์„ ๊ตฌ์„ฑํ•˜๋ ค๋ฉด ๋‹ค์Œ ๋ช…๋ น์„ ์‹คํ–‰ํ•˜์„ธ์š”:

accelerate config

ํ™˜๊ฒฝ ๊ตฌ์„ฑ ์ค‘์— DeepSpeed๋ฅผ ์‚ฌ์šฉํ•  ๊ฒƒ์„ ํ™•์ธํ•˜์„ธ์š”. ๊ทธ๋Ÿฌ๋ฉด DeepSpeed stage 2, fp16 ํ˜ผํ•ฉ ์ •๋ฐ€๋„๋ฅผ ๊ฒฐํ•ฉํ•˜๊ณ  ๋ชจ๋ธ ๋งค๊ฐœ๋ณ€์ˆ˜์™€ ์˜ตํ‹ฐ๋งˆ์ด์ € ์ƒํƒœ๋ฅผ ๋ชจ๋‘ CPU๋กœ ์˜คํ”„๋กœ๋“œํ•˜๋ฉด 8GB VRAM ๋ฏธ๋งŒ์—์„œ ํ•™์Šตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ๋‹จ์ ์€ ๋” ๋งŽ์€ ์‹œ์Šคํ…œ RAM(์•ฝ 25GB)์ด ํ•„์š”ํ•˜๋‹ค๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค. ์ถ”๊ฐ€ ๊ตฌ์„ฑ ์˜ต์…˜์€ DeepSpeed ๋ฌธ์„œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

๋˜ํ•œ ๊ธฐ๋ณธ Adam ์˜ตํ‹ฐ๋งˆ์ด์ €๋ฅผ DeepSpeed์˜ ์ตœ์ ํ™”๋œ Adam ๋ฒ„์ „์œผ๋กœ ๋ณ€๊ฒฝํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. ์ด๋Š” ์ƒ๋‹นํ•œ ์†๋„ ํ–ฅ์ƒ์„ ์œ„ํ•œ Adam์ธ deepspeed.ops.adam.DeepSpeedCPUAdam์ž…๋‹ˆ๋‹ค. DeepSpeedCPUAdam์„ ํ™œ์„ฑํ™”ํ•˜๋ ค๋ฉด ์‹œ์Šคํ…œ์˜ CUDA toolchain ๋ฒ„์ „์ด PyTorch์™€ ํ•จ๊ป˜ ์„ค์น˜๋œ ๊ฒƒ๊ณผ ๋™์ผํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

8๋น„ํŠธ ์˜ตํ‹ฐ๋งˆ์ด์ €๋Š” ํ˜„์žฌ DeepSpeed์™€ ํ˜ธํ™˜๋˜์ง€ ์•Š๋Š” ๊ฒƒ ๊ฐ™์Šต๋‹ˆ๋‹ค.

๋‹ค์Œ ๋ช…๋ น์œผ๋กœ ํ•™์Šต์„ ์‹œ์ž‘ํ•ฉ๋‹ˆ๋‹ค:

export MODEL_NAME="CompVis/stable-diffusion-v1-4"
export INSTANCE_DIR="path_to_training_images"
export CLASS_DIR="path_to_class_images"
export OUTPUT_DIR="path_to_saved_model"

accelerate launch train_dreambooth.py \
  --pretrained_model_name_or_path=$MODEL_NAME \
  --instance_data_dir=$INSTANCE_DIR \
  --class_data_dir=$CLASS_DIR \
  --output_dir=$OUTPUT_DIR \
  --with_prior_preservation --prior_loss_weight=1.0 \
  --instance_prompt="a photo of sks dog" \
  --class_prompt="a photo of dog" \
  --resolution=512 \
  --train_batch_size=1 \
  --sample_batch_size=1 \
  --gradient_accumulation_steps=1 --gradient_checkpointing \
  --learning_rate=5e-6 \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --num_class_images=200 \
  --max_train_steps=800 \
  --mixed_precision=fp16

์ถ”๋ก 

๋ชจ๋ธ์„ ํ•™์Šตํ•œ ํ›„์—๋Š”, ๋ชจ๋ธ์ด ์ €์žฅ๋œ ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•ด StableDiffusionPipeline๋กœ ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ํ”„๋กฌํ”„ํŠธ์— ํ•™์Šต์— ์‚ฌ์šฉ๋œ ํŠน์ˆ˜ ์‹๋ณ„์ž(์ด์ „ ์˜ˆ์‹œ์˜ sks)๊ฐ€ ํฌํ•จ๋˜์–ด ์žˆ๋Š”์ง€ ํ™•์ธํ•˜์„ธ์š”.

"accelerate>=0.16.0"์ด ์„ค์น˜๋˜์–ด ์žˆ๋Š” ๊ฒฝ์šฐ ๋‹ค์Œ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ค‘๊ฐ„ ์ฒดํฌํฌ์ธํŠธ์—์„œ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

from diffusers import StableDiffusionPipeline
import torch

model_id = "path_to_saved_model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")

prompt = "A photo of sks dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]

image.save("dog-bucket.png")

์ €์žฅ๋œ ํ•™์Šต ์ฒดํฌํฌ์ธํŠธ์—์„œ๋„ ์ถ”๋ก ์„ ์‹คํ–‰ํ•  ์ˆ˜๋„ ์žˆ์Šต๋‹ˆ๋‹ค.

< > Update on GitHub

ะ›ัƒั‡ัˆะธะน ั‡ะฐัั‚ะฝั‹ะน ั…ะพัั‚ะธะฝะณ