Is 4 x H20 96G sufficient to run this model?

#2
by milongwong - opened

We have limited resource and have questions below:

  1. Is 4 x H20 96G sufficient to run this model?
  2. Has anyone tried to get it run by SGlang to get better performance output?

The size of quantized params is 346GB. Still very large

4 x H20 96G can run it. But the context length will be very short.

Red Hat AI org
  1. I don't think the model would fit on that configuration, especially when considering that on top of weights you need extra memory for reasonably large context size.
  2. We are creating our models for vLLM specifically. We are not aware of the compatibility with SGlang.
ekurtic changed discussion status to closed

Sign up or log in to comment