https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification

In the readme, you can find --model_name_or_path bert-base-multilingual-cased \\

If you replace this line with --model_name_or_path bigscience/bloom-560m \\,
you will fine-tune the (smallest) bloom model on the dataset in question. If you are doing something other than text classification, please browse ../examples/pytorch to find what works for you. Beware that if you want to train the largest bloom (bigscience/bloom), you will need several hundred gigabytes of GPU memory.

If you want to do that in a modest setup, you can try https://github.com/bigscience-workshop/petals for distributed training.

\n","updatedAt":"2023-02-10T12:36:41.793Z","author":{"_id":"609baae0fe087f3d04cf0481","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1640134437444-609baae0fe087f3d04cf0481.jpeg","fullname":"Yozh","name":"justheuristic","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":5}},"numEdits":0,"editors":["justheuristic"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1640134437444-609baae0fe087f3d04cf0481.jpeg"],"reactions":[{"reaction":"❤️","users":["awacke1","gengxiao1216","Tykhist","shashanku"],"count":4},{"reaction":"👍","users":["kusiko"],"count":1}],"isReport":false}},{"id":"63e652ea5c3664766ec18822","author":{"_id":"63d135482036e44c44f6603e","avatarUrl":"/avatars/4673f323abd25f8b24d9e226fa72116e.svg","fullname":"Nicolas","name":"NicolasExo","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"isOwner":false,"isOrgMember":false},"createdAt":"2023-02-10T14:21:30.000Z","type":"comment","data":{"edited":true,"hidden":false,"latest":{"raw":"Justheuristic, thank you very much for your answer ! I am doing the text generation for my project and i would like to train the model Bloom on my own dataset. In this case should i browse the link ../examples/pytorch you have kindly provided in order to find the necessary information about it ?\n\nThank you very much !","html":"

Justheuristic, thank you very much for your answer ! I am doing the text generation for my project and i would like to train the model Bloom on my own dataset. In this case should i browse the link ../examples/pytorch you have kindly provided in order to find the necessary information about it ?

Thank you very much !

\n","updatedAt":"2023-02-10T15:48:08.682Z","author":{"_id":"63d135482036e44c44f6603e","avatarUrl":"/avatars/4673f323abd25f8b24d9e226fa72116e.svg","fullname":"Nicolas","name":"NicolasExo","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"editors":["NicolasExo"],"editorAvatarUrls":["/avatars/4673f323abd25f8b24d9e226fa72116e.svg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"repo":{"name":"bigscience/bloom","type":"model"},"activeTab":"discussion","discussionRole":0,"watched":false,"muted":false,"repoDiscussionsLocked":false}">

Training or Fine-tuning the Bloom AI Model on my own Dataset

#187

by NicolasExo - opened Feb 10, 2023

Discussion

https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification

In the readme, you can find --model_name_or_path bert-base-multilingual-cased \\

If you want to do that in a modest setup, you can try https://github.com/bigscience-workshop/petals for distributed training.

Thank you very much !

\n","updatedAt":"2023-02-10T15:48:08.682Z","author":{"_id":"63d135482036e44c44f6603e","avatarUrl":"/avatars/4673f323abd25f8b24d9e226fa72116e.svg","fullname":"Nicolas","name":"NicolasExo","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false}},"numEdits":1,"editors":["NicolasExo"],"editorAvatarUrls":["/avatars/4673f323abd25f8b24d9e226fa72116e.svg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"primaryEmailConfirmed":false,"repo":{"name":"bigscience/bloom","type":"model"},"discussionRole":0,"acceptLanguages":["*"],"hideComments":true,"repoDiscussionsLocked":false,"isDiscussionAuthor":false}">

NicolasExo

Feb 10, 2023

•

edited Feb 10, 2023

Hello everyone ! I have a question to ask you, dear community.

How can i train the Bloom AI Model with my own training dataset ?
Is there any function in Bloom like "BloomSomeClass.train(inputs, outputs, params)" ?

Thank you for your answers in advance !

NicolasExo changed discussion title from Training the Bloom AI Model to Training or Fine-tuning the Bloom AI Model on my own Dataset Feb 10, 2023

justheuristic

BigScience Workshop org Feb 10, 2023

Hi!
You fine-tune BLOOM the same way you fine-tune any other model on HF.

Consider the official example for text classification: https://github.com/huggingface/transformers/tree/main/examples/pytorch/text-classification

In the readme, you can find --model_name_or_path bert-base-multilingual-cased \

If you replace this line with --model_name_or_path bigscience/bloom-560m \,
you will fine-tune the (smallest) bloom model on the dataset in question. If you are doing something other than text classification, please browse ../examples/pytorch to find what works for you. Beware that if you want to train the largest bloom (bigscience/bloom), you will need several hundred gigabytes of GPU memory.

If you want to do that in a modest setup, you can try https://github.com/bigscience-workshop/petals for distributed training.

NicolasExo

Feb 10, 2023

•

edited Feb 10, 2023

Thank you very much !

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment