lynx   »   [go: up one dir, main page]

Diffusers documentation

Textual inversion

You are viewing main version, which requires installation from source. If you'd like regular pip install, checkout the latest stable version (v0.35.1).
Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Textual inversion

StableDiffusionPipeline์€ textual-inversion์„ ์ง€์›ํ•˜๋Š”๋ฐ, ์ด๋Š” ๋ช‡ ๊ฐœ์˜ ์ƒ˜ํ”Œ ์ด๋ฏธ์ง€๋งŒ์œผ๋กœ stable diffusion๊ณผ ๊ฐ™์€ ๋ชจ๋ธ์ด ์ƒˆ๋กœ์šด ์ปจ์…‰์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ธฐ๋ฒ•์ž…๋‹ˆ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ๋” ์ž˜ ์ œ์–ดํ•˜๊ณ  ํŠน์ • ์ปจ์…‰์— ๋งž๊ฒŒ ๋ชจ๋ธ์„ ์กฐ์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ปค๋ฎค๋‹ˆํ‹ฐ์—์„œ ๋งŒ๋“ค์–ด์ง„ ์ปจ์…‰๋“ค์˜ ์ปฌ๋ ‰์…˜์€ Stable Diffusion Conceptualizer๋ฅผ ํ†ตํ•ด ๋น ๋ฅด๊ฒŒ ์‚ฌ์šฉํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” Stable Diffusion Conceptualizer์—์„œ ์‚ฌ์ „ํ•™์Šตํ•œ ์ปจ์…‰์„ ์‚ฌ์šฉํ•˜์—ฌ textual-inversion์œผ๋กœ ์ถ”๋ก ์„ ์‹คํ–‰ํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋ณด์—ฌ๋“œ๋ฆฝ๋‹ˆ๋‹ค. textual-inversion์œผ๋กœ ๋ชจ๋ธ์— ์ƒˆ๋กœ์šด ์ปจ์…‰์„ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐ ๊ด€์‹ฌ์ด ์žˆ์œผ์‹œ๋‹ค๋ฉด, Textual Inversion ํ›ˆ๋ จ ๊ฐ€์ด๋“œ๋ฅผ ์ฐธ์กฐํ•˜์„ธ์š”.

Hugging Face ๊ณ„์ •์œผ๋กœ ๋กœ๊ทธ์ธํ•˜์„ธ์š”:

from huggingface_hub import notebook_login

notebook_login()

ํ•„์š”ํ•œ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๋ถˆ๋Ÿฌ์˜ค๊ณ  ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋ฅผ ์‹œ๊ฐํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋„์šฐ๋ฏธ ํ•จ์ˆ˜ image_grid๋ฅผ ๋งŒ๋“ญ๋‹ˆ๋‹ค:

import os
import torch

import PIL
from PIL import Image

from diffusers import StableDiffusionPipeline
from transformers import CLIPImageProcessor, CLIPTextModel, CLIPTokenizer


def image_grid(imgs, rows, cols):
    assert len(imgs) == rows * cols

    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

Stable Diffusion๊ณผ Stable Diffusion Conceptualizer์—์„œ ์‚ฌ์ „ํ•™์Šต๋œ ์ปจ์…‰์„ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค:

pretrained_model_name_or_path = "stable-diffusion-v1-5/stable-diffusion-v1-5"
repo_id_embeds = "sd-concepts-library/cat-toy"

์ด์ œ ํŒŒ์ดํ”„๋ผ์ธ์„ ๋กœ๋“œํ•˜๊ณ  ์‚ฌ์ „ํ•™์Šต๋œ ์ปจ์…‰์„ ํŒŒ์ดํ”„๋ผ์ธ์— ์ „๋‹ฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค:

pipeline = StableDiffusionPipeline.from_pretrained(pretrained_model_name_or_path, torch_dtype=torch.float16).to("cuda")

pipeline.load_textual_inversion(repo_id_embeds)

ํŠน๋ณ„ํ•œ placeholder token โ€™<cat-toy>โ€˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์‚ฌ์ „ํ•™์Šต๋œ ์ปจ์…‰์œผ๋กœ ํ”„๋กฌํ”„ํŠธ๋ฅผ ๋งŒ๋“ค๊ณ , ์ƒ์„ฑํ•  ์ƒ˜ํ”Œ์˜ ์ˆ˜์™€ ์ด๋ฏธ์ง€ ํ–‰์˜ ์ˆ˜๋ฅผ ์„ ํƒํ•ฉ๋‹ˆ๋‹ค:

prompt = "a grafitti in a favela wall with a <cat-toy> on it"

num_samples = 2
num_rows = 2

๊ทธ๋Ÿฐ ๋‹ค์Œ ํŒŒ์ดํ”„๋ผ์ธ์„ ์‹คํ–‰ํ•˜๊ณ , ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๋“ค์„ ์ €์žฅํ•ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  ์ฒ˜์Œ์— ๋งŒ๋“ค์—ˆ๋˜ ๋„์šฐ๋ฏธ ํ•จ์ˆ˜ image_grid๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ƒ์„ฑ ๊ฒฐ๊ณผ๋“ค์„ ์‹œ๊ฐํ™”ํ•ฉ๋‹ˆ๋‹ค. ์ด ๋•Œ num_inference_steps์™€ guidance_scale๊ณผ ๊ฐ™์€ ๋งค๊ฐœ ๋ณ€์ˆ˜๋“ค์„ ์กฐ์ •ํ•˜์—ฌ, ์ด๊ฒƒ๋“ค์ด ์ด๋ฏธ์ง€ ํ’ˆ์งˆ์— ์–ด๋– ํ•œ ์˜ํ–ฅ์„ ๋ฏธ์น˜๋Š”์ง€๋ฅผ ์ž์œ ๋กญ๊ฒŒ ํ™•์ธํ•ด๋ณด์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค.

all_images = []
for _ in range(num_rows):
    images = pipe(prompt, num_images_per_prompt=num_samples, num_inference_steps=50, guidance_scale=7.5).images
    all_images.extend(images)

grid = image_grid(all_images, num_samples, num_rows)
grid
< > Update on GitHub

ะ›ัƒั‡ัˆะธะน ั‡ะฐัั‚ะฝั‹ะน ั…ะพัั‚ะธะฝะณ