-
Notifications
You must be signed in to change notification settings - Fork 560
Open
@sparkFeiYang
Description
Now I tried run this 27b Gemma3 model on 2*40G VRAM A100 GPUs. If I run this script directly, it will shows out of memory, because bf16 model is 55G and one A100 can't afford this. I use DeepSpeed to modify the script, and finally I succeed run this model on 2 A100 GPUs. But the time cost is too much for each run, especially when I set large output_len like 1500, and this will cost about 2 hours for 1 prompt run with 27b bf16.
Metadata
Metadata
Assignees
Labels
No labels