Is 4 x H20 96G sufficient to run this model?
#2
by
milongwong
- opened
We have limited resource and have questions below:
- Is 4 x H20 96G sufficient to run this model?
- Has anyone tried to get it run by SGlang to get better performance output?
The size of quantized params is 346GB. Still very large
4 x H20 96G can run it. But the context length will be very short.
- I don't think the model would fit on that configuration, especially when considering that on top of weights you need extra memory for reasonably large context size.
- We are creating our models for vLLM specifically. We are not aware of the compatibility with SGlang.
ekurtic
changed discussion status to
closed