@quantoser\n\t !
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16?

\n","updatedAt":"2024-03-13T11:23:14.456Z","author":{"_id":"62441d1d9fdefb55a0b7d12c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png","fullname":"Younes B","name":"ybelkada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":503}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9535488486289978},"editors":["ybelkada"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png"],"reactions":[],"isReport":false}},{"id":"6697cc84d8d26ae503c0240d","author":{"_id":"637a4a31e20e8765c5220b77","avatarUrl":"/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg","fullname":"Luiz Gustavo Martins","name":"gusthema","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isOwner":false,"isOrgMember":true},"createdAt":"2024-07-17T13:52:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"yes it does! \nFor example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.","html":"

yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.

\n","updatedAt":"2024-07-17T13:52:04.268Z","author":{"_id":"637a4a31e20e8765c5220b77","avatarUrl":"/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg","fullname":"Luiz Gustavo Martins","name":"gusthema","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6466396450996399},"editors":["gusthema"],"editorAvatarUrls":["/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"repo":{"name":"google/gemma-7b","type":"model"},"activeTab":"discussion","discussionRole":0,"watched":false,"muted":false,"repoDiscussionsLocked":false}">

Silicon Macs support.

#66

by quantoser - opened Mar 11, 2024

Discussion

@quantoser\n\t !
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16?

yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.

\n","updatedAt":"2024-07-17T13:52:04.268Z","author":{"_id":"637a4a31e20e8765c5220b77","avatarUrl":"/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg","fullname":"Luiz Gustavo Martins","name":"gusthema","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.6466396450996399},"editors":["gusthema"],"editorAvatarUrls":["/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg"],"reactions":[],"isReport":false}}],"pinned":false,"locked":false,"collection":"discussions","isPullRequest":false,"isReport":false},"primaryEmailConfirmed":false,"repo":{"name":"google/gemma-7b","type":"model"},"discussionRole":0,"acceptLanguages":["*"],"hideComments":true,"repoDiscussionsLocked":false,"isDiscussionAuthor":false}">

quantoser

Mar 11, 2024

Does this model work on Macs with Silicon chips? I'm running it on a Mac Pro M1 and it gets stuck with:

UserWarning: Using the model-agnostic default max_length (=20) to control the generation length. We recommend setting max_new_tokens to control the maximum length of the generation.
warnings.warn(

The process just sits there eating up CPU and memory, but no output ever produced.

ybelkada

Mar 13, 2024

Hi @quantoser !
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16?

gusthema

Google org Jul 17, 2024

yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment