Does the model support multi-token prediction? if so, how do you configure it within inference engines like vLLM or llama.cpp?
· Sign up or log in to comment