And I would just search for GitHub in the Datasets to use for fine tuning. For example, \"codeparrot\" has a few good ones.
Filter down to the language you want to fine tune on for better results:
I'm looking to do about the same. Best of luck!
\n","updatedAt":"2023-03-05T08:27:32.865Z","author":{"_id":"6321d30595d6f717a8c2799b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6321d30595d6f717a8c2799b/VUS1r0X1XuEB9-LTGIF-Q.jpeg","fullname":"Aliasfox","name":"aliasfox","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"editors":["aliasfox"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6321d30595d6f717a8c2799b/VUS1r0X1XuEB9-LTGIF-Q.jpeg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"repo":{"name":"bigscience/bloom","type":"model"},"activeTab":"discussion","discussionRole":0,"watched":false,"muted":false,"repoDiscussionsLocked":false}">Is the 14 programming Laungugae dataset uploaded on hugging face ? Any other option to doenload the data
And I would just search for GitHub in the Datasets to use for fine tuning. For example, \"codeparrot\" has a few good ones.
Filter down to the language you want to fine tune on for better results:
I'm looking to do about the same. Best of luck!
\n","updatedAt":"2023-03-05T08:27:32.865Z","author":{"_id":"6321d30595d6f717a8c2799b","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/6321d30595d6f717a8c2799b/VUS1r0X1XuEB9-LTGIF-Q.jpeg","fullname":"Aliasfox","name":"aliasfox","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3}},"numEdits":0,"editors":["aliasfox"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/6321d30595d6f717a8c2799b/VUS1r0X1XuEB9-LTGIF-Q.jpeg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"primaryEmailConfirmed":false,"repo":{"name":"bigscience/bloom","type":"model"},"discussionRole":0,"acceptLanguages":["*"],"hideComments":true,"repoDiscussionsLocked":false,"isDiscussionAuthor":false}">I am looking for programming launguage dataset which is used in the model to fine tune it . Where can i get it ?
They are worlds off of "code-davinci-003", now surpassed by "gpt3.5-turbo" with better results at 1/3rd the price, but these are the best models I found:
And I would just search for GitHub in the Datasets to use for fine tuning. For example, "codeparrot" has a few good ones.
Filter down to the language you want to fine tune on for better results:
I'm looking to do about the same. Best of luck!