lynx   »   [go: up one dir, main page]

Text-to-Image
Diffusers
Safetensors
StableDiffusionPipeline
stable-diffusion
https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/text_encoder/config.json#L19, However, when looking into the official OpenClip H-14, the number of the hidden layer is 24 https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model_configs/ViT-H-14.json#L15, this can also be confirmed from the number of layers in the LAION CLIP ViT H-14 repo, https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/blob/main/config.json#L54

\n

Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?

\n

Thanks

\n","updatedAt":"2023-08-22T01:09:45.796Z","author":{"_id":"633b7a4b0d68f86e2d98de05","avatarUrl":"/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg","fullname":"Jun Gao","name":"JungaoCanada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.782565712928772},"editors":["JungaoCanada"],"editorAvatarUrls":["/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg"],"reactions":[],"isReport":false}},{"id":"656a5f1baf6d3c4129e96e1e","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isOwner":false,"isOrgMember":false},"createdAt":"2023-12-01T22:32:59.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024? ","html":"

Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024?

\n","updatedAt":"2023-12-01T22:33:16.496Z","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9129307270050049},"editors":["vtoth"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"repo":{"name":"stabilityai/stable-diffusion-2-1","type":"model"},"activeTab":"discussion","discussionRole":0,"watched":false,"muted":false,"repoDiscussionsLocked":false}">

Question in the Text encoder setting

#81
by JungaoCanada - opened
https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/text_encoder/config.json#L19, However, when looking into the official OpenClip H-14, the number of the hidden layer is 24 https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model_configs/ViT-H-14.json#L15, this can also be confirmed from the number of layers in the LAION CLIP ViT H-14 repo, https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/blob/main/config.json#L54

\n

Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?

\n

Thanks

\n","updatedAt":"2023-08-22T01:09:45.796Z","author":{"_id":"633b7a4b0d68f86e2d98de05","avatarUrl":"/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg","fullname":"Jun Gao","name":"JungaoCanada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.782565712928772},"editors":["JungaoCanada"],"editorAvatarUrls":["/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg"],"reactions":[],"isReport":false}},{"id":"656a5f1baf6d3c4129e96e1e","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isOwner":false,"isOrgMember":false},"createdAt":"2023-12-01T22:32:59.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024? ","html":"

Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024?

\n","updatedAt":"2023-12-01T22:33:16.496Z","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9129307270050049},"editors":["vtoth"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"primaryEmailConfirmed":false,"repo":{"name":"stabilityai/stable-diffusion-2-1","type":"model"},"discussionRole":0,"acceptLanguages":["*"],"hideComments":true,"repoDiscussionsLocked":false,"isDiscussionAuthor":false}">

Hi,

I find there probably is a problem in setting up the text encoder, not sure why this occurs...

In particular, in the text encoder, the number of hidden layers is set to 23 https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/text_encoder/config.json#L19, However, when looking into the official OpenClip H-14, the number of the hidden layer is 24 https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model_configs/ViT-H-14.json#L15, this can also be confirmed from the number of layers in the LAION CLIP ViT H-14 repo, https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/blob/main/config.json#L54

Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?

Thanks

Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024?

Sign up or log in to comment

Лучший частный хостинг