Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?
\nThanks
\n","updatedAt":"2023-08-22T01:09:45.796Z","author":{"_id":"633b7a4b0d68f86e2d98de05","avatarUrl":"/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg","fullname":"Jun Gao","name":"JungaoCanada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.782565712928772},"editors":["JungaoCanada"],"editorAvatarUrls":["/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg"],"reactions":[],"isReport":false}},{"id":"656a5f1baf6d3c4129e96e1e","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isOwner":false,"isOrgMember":false},"createdAt":"2023-12-01T22:32:59.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024? ","html":"Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024?
\n","updatedAt":"2023-12-01T22:33:16.496Z","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9129307270050049},"editors":["vtoth"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"repo":{"name":"stabilityai/stable-diffusion-2-1","type":"model"},"activeTab":"discussion","discussionRole":0,"watched":false,"muted":false,"repoDiscussionsLocked":false}">Question in the Text encoder setting
Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?
\nThanks
\n","updatedAt":"2023-08-22T01:09:45.796Z","author":{"_id":"633b7a4b0d68f86e2d98de05","avatarUrl":"/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg","fullname":"Jun Gao","name":"JungaoCanada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.782565712928772},"editors":["JungaoCanada"],"editorAvatarUrls":["/avatars/5d48c171ddbcc7ca39bdc0d11c6224e4.svg"],"reactions":[],"isReport":false}},{"id":"656a5f1baf6d3c4129e96e1e","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4,"isOwner":false,"isOrgMember":false},"createdAt":"2023-12-01T22:32:59.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024? ","html":"Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024?
\n","updatedAt":"2023-12-01T22:33:16.496Z","author":{"_id":"62fa41d0363251ee40a2915d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg","fullname":"Viktor Toth","name":"vtoth","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":4}},"numEdits":1,"identifiedLanguage":{"language":"en","probability":0.9129307270050049},"editors":["vtoth"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/62fa41d0363251ee40a2915d/AWbQCvPkxujxR5BCfCniz.jpeg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"primaryEmailConfirmed":false,"repo":{"name":"stabilityai/stable-diffusion-2-1","type":"model"},"discussionRole":0,"acceptLanguages":["*"],"hideComments":true,"repoDiscussionsLocked":false,"isDiscussionAuthor":false}">Hi,
I find there probably is a problem in setting up the text encoder, not sure why this occurs...
In particular, in the text encoder, the number of hidden layers is set to 23 https://huggingface.co/stabilityai/stable-diffusion-2-1/blob/main/text_encoder/config.json#L19, However, when looking into the official OpenClip H-14, the number of the hidden layer is 24 https://github.com/mlfoundations/open_clip/blob/main/src/open_clip/model_configs/ViT-H-14.json#L15, this can also be confirmed from the number of layers in the LAION CLIP ViT H-14 repo, https://huggingface.co/laion/CLIP-ViT-H-14-laion2B-s32B-b79K/blob/main/config.json#L54
Does anyone know why the hugging face repo is setting the number of hidden layers to 23? Is this a bug, or a small trick to improve the sampling performance?
Thanks
Can this possibly be about the last projection layer being removed from/not used in SD as it takes the 77x1024 text embedding as input, not the final CLIP projection of dim 1024?