Instructions to use Salesforce/CoDA-v0-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Salesforce/CoDA-v0-Instruct with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Salesforce/CoDA-v0-Instruct", trust_remote_code=True) messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("Salesforce/CoDA-v0-Instruct", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Salesforce/CoDA-v0-Instruct with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Salesforce/CoDA-v0-Instruct" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/CoDA-v0-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Salesforce/CoDA-v0-Instruct
- SGLang
How to use Salesforce/CoDA-v0-Instruct with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Salesforce/CoDA-v0-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/CoDA-v0-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Salesforce/CoDA-v0-Instruct" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Salesforce/CoDA-v0-Instruct", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Salesforce/CoDA-v0-Instruct with Docker Model Runner:
docker model run hf.co/Salesforce/CoDA-v0-Instruct
Corrupted weights?
I have been trying to write an inference engine for CoDA in Swift/MLX and it only generated gibberish. I then checked the weights, e.g.:
Layer 23 kNorm loaded: shape=[128], std=1.2598647
RAW kNorm stats: mean=2.0254, min=-0.0121, max=9.0000
I then did the same thing via Google Colab and PyTorch, e.g.
--------------------
Layer: model.layers.23.self_attn.q_proj.weight
Stats: Mean=-0.0001, Std=0.0584, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
Stats: Mean=0.0001, Std=0.0614, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
Stats: Mean=1.4233, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
Stats: Mean=2.0254, Std=1.2648, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
Stats: Mean=0.0000, Std=0.0683, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
Stats: Mean=-0.0000, Std=0.0622, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
Stats: Mean=10.5016, Std=5.4567, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
Stats: Mean=2.0072, Std=0.3130, Min=-0.0005, Max=5.1875
--------------------
I'll gladly provide all the values if needed.
But the question is: Are the weights corrupted?
Hi Muzel, thanks for checking in. Could you provide the environment version you worked on, especially transformer version?
- transformers: 4.57.1
- torch: 2.8.0+cu126
- Python 3.12
Could you try an elder version, say 4.47.1?
With 4.47.1:
Layer: model.layers.23.self_attn.q_proj.weight
Stats: Mean=-0.0001, Std=0.0584, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
Stats: Mean=0.0001, Std=0.0614, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
Stats: Mean=1.4233, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
Stats: Mean=2.0254, Std=1.2648, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
Stats: Mean=0.0000, Std=0.0683, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
Stats: Mean=-0.0000, Std=0.0622, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
Stats: Mean=10.5016, Std=5.4567, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
Stats: Mean=2.0072, Std=0.3130, Min=-0.0005, Max=5.1875
Did you also experience similar behavior? We did our post-training and eval under 4.47.1 and bfloat16 precision.
model_name = "Salesforce/CoDA-v0-Instruct"
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
I don't understand what you mean with 'did you experience similar behavior', sorry. I hadn't tried running my inference engine with transformers, as I'm using MLX.
I reran the weights logging with exactly your configuration:
Layer: model.layers.23.self_attn.q_proj.weight
Stats: Mean=-0.0001, Std=0.0583, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
Stats: Mean=0.0001, Std=0.0615, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
Stats: Mean=1.4219, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
Stats: Mean=2.0312, Std=1.2656, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
Stats: Mean=0.0000, Std=0.0684, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
Stats: Mean=-0.0000, Std=0.0623, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
Stats: Mean=10.5000, Std=5.4688, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
Stats: Mean=2.0000, Std=0.3125, Min=-0.0005, Max=5.1875
Hi Muzel, sorry for the unclear context - could you replicate the undesired behavior of the model/suspicious weights when loading the model in transformers 4.47.1 and do inference? I am not an expert of MLX and not sure what happens in your environment.
No, I cannot, I do not have the capacity to rewrite the whole framework just to test that - but thanks for helping anyway!