https://huggingface.co/rezashkv/diffusion_pruning\n","updatedAt":"2024-06-22T16:35:27.748Z","author":{"_id":"65466e8b4d1931dc93dbf9a6","avatarUrl":"/avatars/8626417bac65f431bc50345a024dd570.svg","fullname":"Reza Shirkavand","name":"rezashkv","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.5918209552764893},"editors":["rezashkv"],"editorAvatarUrls":["/avatars/8626417bac65f431bc50345a024dd570.svg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2406.12042","authors":[{"_id":"66723abd2ff693dab0208a18","name":"Alireza Ganjdanesh","hidden":false},{"_id":"66723abd2ff693dab0208a19","user":{"_id":"65466e8b4d1931dc93dbf9a6","avatarUrl":"/avatars/8626417bac65f431bc50345a024dd570.svg","isPro":false,"fullname":"Reza Shirkavand","user":"rezashkv","type":"user"},"name":"Reza Shirkavand","status":"claimed_verified","statusLastChangedAt":"2024-06-19T20:20:53.785Z","hidden":false},{"_id":"66723abd2ff693dab0208a1a","user":{"_id":"6673b68e27be15fd5fa84aaf","avatarUrl":"/avatars/4584d95c05dd035b7184eec713a751bd.svg","isPro":false,"fullname":"Shangqian Gao","user":"sgao84","type":"user"},"name":"Shangqian Gao","status":"claimed_verified","statusLastChangedAt":"2024-06-20T07:34:37.120Z","hidden":false},{"_id":"66723abd2ff693dab0208a1b","name":"Heng Huang","hidden":false}],"publishedAt":"2024-06-17T19:22:04.000Z","submittedOnDailyAt":"2024-06-19T18:53:29.871Z","title":"Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image\n Diffusion Models","submittedOnDailyBy":{"_id":"65466e8b4d1931dc93dbf9a6","avatarUrl":"/avatars/8626417bac65f431bc50345a024dd570.svg","isPro":false,"fullname":"Reza Shirkavand","user":"rezashkv","type":"user"},"summary":"Text-to-image (T2I) diffusion models have demonstrated impressive image\ngeneration capabilities. Still, their computational intensity prohibits\nresource-constrained organizations from deploying T2I models after fine-tuning\nthem on their internal target data. While pruning techniques offer a potential\nsolution to reduce the computational burden of T2I models, static pruning\nmethods use the same pruned model for all input prompts, overlooking the\nvarying capacity requirements of different prompts. Dynamic pruning addresses\nthis issue by utilizing a separate sub-network for each prompt, but it prevents\nbatch parallelism on GPUs. To overcome these limitations, we introduce Adaptive\nPrompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed\nfor T2I diffusion models. Central to our approach is a prompt router model,\nwhich learns to determine the required capacity for an input text prompt and\nroutes it to an architecture code, given a total desired compute budget for\nprompts. Each architecture code represents a specialized model tailored to the\nprompts assigned to it, and the number of codes is a hyperparameter. We train\nthe prompt router and architecture codes using contrastive learning, ensuring\nthat similar prompts are mapped to nearby codes. Further, we employ optimal\ntransport to prevent the codes from collapsing into a single one. We\ndemonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using\nCC3M and COCO as target datasets. APTP outperforms the single-model pruning\nbaselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters\nlearned by APTP reveals they are semantically meaningful. We also show that\nAPTP can automatically discover previously empirically found challenging\nprompts for SD, e.g., prompts for generating text images, assigning them to\nhigher capacity codes.","upvotes":8,"discussionId":"66723ac22ff693dab0208b7b","ai_summary":"Adaptive prompt-tailored pruning optimizes T2I diffusion models by dynamically adjusting the model capacity per text prompt to enhance efficiency and performance.","ai_keywords":["diffusion models","text-to-image","computational intensity","resource-constrained","fine-tuning","pruning techniques","static pruning","dynamic pruning","prompt router model","architecture code","prompt-based pruning","contrastive learning","optimal transport","Stable Diffusion","FID","CLIP","CMMD scores","semantically meaningful clusters"]},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"648eb1eb59c4e5c87dc116e0","avatarUrl":"/avatars/c636cea39c2c0937f01398c94ead5dad.svg","isPro":false,"fullname":"fdsqefsgergd","user":"T-representer","type":"user"},{"_id":"65466e8b4d1931dc93dbf9a6","avatarUrl":"/avatars/8626417bac65f431bc50345a024dd570.svg","isPro":false,"fullname":"Reza Shirkavand","user":"rezashkv","type":"user"},{"_id":"620783f24e28382272337ba4","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/620783f24e28382272337ba4/zkUveQPNiDfYjgGhuFErj.jpeg","isPro":false,"fullname":"GuoLiangTang","user":"Tommy930","type":"user"},{"_id":"641b754d1911d3be6745cce9","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/641b754d1911d3be6745cce9/DxjZG1XT4H3ZHF7qHxWxk.jpeg","isPro":true,"fullname":"atayloraerospace","user":"Taylor658","type":"user"},{"_id":"6538119803519fddb4a17e10","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6538119803519fddb4a17e10/ffJMkdx-rM7VvLTCM6ri_.jpeg","isPro":false,"fullname":"samusenps","user":"samusenps","type":"user"},{"_id":"643b19f8a856622f978df30f","avatarUrl":"/avatars/c82779fdf94f80cdb5020504f83c818b.svg","isPro":false,"fullname":"Yatharth Sharma","user":"YaTharThShaRma999","type":"user"},{"_id":"6673b68e27be15fd5fa84aaf","avatarUrl":"/avatars/4584d95c05dd035b7184eec713a751bd.svg","isPro":false,"fullname":"Shangqian Gao","user":"sgao84","type":"user"},{"_id":"6684b806adc5f5049f0a6d0f","avatarUrl":"/avatars/7797411f65a0c6d9ca00f9e32c67c1f6.svg","isPro":false,"fullname":"HeJia","user":"yogurts","type":"user"}],"acceptLanguages":["*"],"dailyPaperRank":0}">
Not All Prompts Are Made Equal: Prompt-based Pruning of Text-to-Image Diffusion Models
Abstract
Adaptive prompt-tailored pruning optimizes T2I diffusion models by dynamically adjusting the model capacity per text prompt to enhance efficiency and performance.
Text-to-image (T2I) diffusion models have demonstrated impressive image generation capabilities. Still, their computational intensity prohibits resource-constrained organizations from deploying T2I models after fine-tuning them on their internal target data. While pruning techniques offer a potential solution to reduce the computational burden of T2I models, static pruning methods use the same pruned model for all input prompts, overlooking the varying capacity requirements of different prompts. Dynamic pruning addresses this issue by utilizing a separate sub-network for each prompt, but it prevents batch parallelism on GPUs. To overcome these limitations, we introduce Adaptive Prompt-Tailored Pruning (APTP), a novel prompt-based pruning method designed for T2I diffusion models. Central to our approach is a prompt router model, which learns to determine the required capacity for an input text prompt and routes it to an architecture code, given a total desired compute budget for prompts. Each architecture code represents a specialized model tailored to the prompts assigned to it, and the number of codes is a hyperparameter. We train the prompt router and architecture codes using contrastive learning, ensuring that similar prompts are mapped to nearby codes. Further, we employ optimal transport to prevent the codes from collapsing into a single one. We demonstrate APTP's effectiveness by pruning Stable Diffusion (SD) V2.1 using CC3M and COCO as target datasets. APTP outperforms the single-model pruning baselines in terms of FID, CLIP, and CMMD scores. Our analysis of the clusters learned by APTP reveals they are semantically meaningful. We also show that APTP can automatically discover previously empirically found challenging prompts for SD, e.g., prompts for generating text images, assigning them to higher capacity codes.
Community
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper