In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in
bfloat16
?\n","updatedAt":"2024-03-13T11:23:14.456Z","author":{"_id":"62441d1d9fdefb55a0b7d12c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png","fullname":"Younes B","name":"ybelkada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":503}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9535488486289978},"editors":["ybelkada"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png"],"reactions":[],"isReport":false}},{"id":"6697cc84d8d26ae503c0240d","author":{"_id":"637a4a31e20e8765c5220b77","avatarUrl":"/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg","fullname":"Luiz Gustavo Martins","name":"gusthema","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isOwner":false,"isOrgMember":true},"createdAt":"2024-07-17T13:52:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"yes it does! \nFor example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.","html":"yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.
Silicon Macs support.
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in
bfloat16
?\n","updatedAt":"2024-03-13T11:23:14.456Z","author":{"_id":"62441d1d9fdefb55a0b7d12c","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png","fullname":"Younes B","name":"ybelkada","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":503}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9535488486289978},"editors":["ybelkada"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/1648631057413-noauth.png"],"reactions":[],"isReport":false}},{"id":"6697cc84d8d26ae503c0240d","author":{"_id":"637a4a31e20e8765c5220b77","avatarUrl":"/avatars/eb8e5b3bf2c1c99dad4f35ac24788669.svg","fullname":"Luiz Gustavo Martins","name":"gusthema","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":2,"isOwner":false,"isOrgMember":true},"createdAt":"2024-07-17T13:52:04.000Z","type":"comment","data":{"edited":false,"hidden":false,"latest":{"raw":"yes it does! \nFor example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.","html":"yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.
Does this model work on Macs with Silicon chips? I'm running it on a Mac Pro M1 and it gets stuck with:
UserWarning: Using the model-agnostic default max_length
(=20) to control the generation length. We recommend setting max_new_tokens
to control the maximum length of the generation.
warnings.warn(
The process just sits there eating up CPU and memory, but no output ever produced.
Hi
@quantoser
!
In which precision are you running the generation? the 7B model will need ~30GB RAM just to be loaded on the CPU in float32, can you perhaps try to load the model in bfloat16
?
yes it does!
For example you can use Gemma.cpp (https://github.com/google/gemma.cpp) or Ollama and both run on Mac.