Instructions to use Salesforce/CoDA-v0-Instruct with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Salesforce/CoDA-v0-Instruct with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Salesforce/CoDA-v0-Instruct", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Salesforce/CoDA-v0-Instruct", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Salesforce/CoDA-v0-Instruct with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Salesforce/CoDA-v0-Instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/CoDA-v0-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Salesforce/CoDA-v0-Instruct

SGLang

How to use Salesforce/CoDA-v0-Instruct with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Salesforce/CoDA-v0-Instruct" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/CoDA-v0-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Salesforce/CoDA-v0-Instruct" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Salesforce/CoDA-v0-Instruct",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Salesforce/CoDA-v0-Instruct with Docker Model Runner:
```
docker model run hf.co/Salesforce/CoDA-v0-Instruct
```

Corrupted weights?

by Muzel - opened Nov 7, 2025

Discussion

Muzel

Nov 7, 2025

I have been trying to write an inference engine for CoDA in Swift/MLX and it only generated gibberish. I then checked the weights, e.g.:

Layer 23 kNorm loaded: shape=[128], std=1.2598647
RAW kNorm stats: mean=2.0254, min=-0.0121, max=9.0000

I then did the same thing via Google Colab and PyTorch, e.g.

--------------------
Layer: model.layers.23.self_attn.q_proj.weight
  Stats: Mean=-0.0001, Std=0.0584, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
  Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
  Stats: Mean=0.0001, Std=0.0614, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
  Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
  Stats: Mean=1.4233, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
  Stats: Mean=2.0254, Std=1.2648, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
  Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
  Stats: Mean=0.0000, Std=0.0683, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
  Stats: Mean=-0.0000, Std=0.0622, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
  Stats: Mean=10.5016, Std=5.4567, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
  Stats: Mean=2.0072, Std=0.3130, Min=-0.0005, Max=5.1875
--------------------

I'll gladly provide all the values if needed.

But the question is: Are the weights corrupted?

hlnchen

Nov 8, 2025

Hi Muzel, thanks for checking in. Could you provide the environment version you worked on, especially transformer version?

Muzel

Nov 8, 2025

transformers: 4.57.1
torch: 2.8.0+cu126
Python 3.12

hlnchen

Nov 9, 2025

Could you try an elder version, say 4.47.1?

Muzel

Nov 9, 2025

•

edited Nov 9, 2025

With 4.47.1:

Layer: model.layers.23.self_attn.q_proj.weight
  Stats: Mean=-0.0001, Std=0.0584, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
  Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
  Stats: Mean=0.0001, Std=0.0614, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
  Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
  Stats: Mean=1.4233, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
  Stats: Mean=2.0254, Std=1.2648, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
  Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
  Stats: Mean=0.0000, Std=0.0683, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
  Stats: Mean=-0.0000, Std=0.0622, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
  Stats: Mean=10.5016, Std=5.4567, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
  Stats: Mean=2.0072, Std=0.3130, Min=-0.0005, Max=5.1875

hlnchen

Nov 10, 2025

Did you also experience similar behavior? We did our post-training and eval under 4.47.1 and bfloat16 precision.

model_name = "Salesforce/CoDA-v0-Instruct"
model = AutoModel.from_pretrained(model_name, torch_dtype=torch.bfloat16, trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)

Muzel

Nov 10, 2025

•

edited Nov 10, 2025

I don't understand what you mean with 'did you experience similar behavior', sorry. I hadn't tried running my inference engine with transformers, as I'm using MLX.

I reran the weights logging with exactly your configuration:

Layer: model.layers.23.self_attn.q_proj.weight
  Stats: Mean=-0.0001, Std=0.0583, Min=-0.4121, Max=0.4141
--------------------
Layer: model.layers.23.self_attn.k_proj.weight
  Stats: Mean=-0.0000, Std=0.0542, Min=-0.3867, Max=0.4062
--------------------
Layer: model.layers.23.self_attn.v_proj.weight
  Stats: Mean=0.0001, Std=0.0615, Min=-0.3945, Max=0.3496
--------------------
Layer: model.layers.23.self_attn.o_proj.weight
  Stats: Mean=0.0000, Std=0.0566, Min=-0.4785, Max=0.4375
--------------------
Layer: model.layers.23.self_attn.q_norm.weight
  Stats: Mean=1.4219, Std=0.5039, Min=-0.0302, Max=2.6094
--------------------
Layer: model.layers.23.self_attn.k_norm.weight
  Stats: Mean=2.0312, Std=1.2656, Min=-0.0121, Max=9.0000
--------------------
Layer: model.layers.23.mlp.gate_proj.weight
  Stats: Mean=-0.0002, Std=0.0608, Min=-1.3203, Max=0.8750
--------------------
Layer: model.layers.23.mlp.up_proj.weight
  Stats: Mean=0.0000, Std=0.0684, Min=-0.7930, Max=0.7422
--------------------
Layer: model.layers.23.mlp.down_proj.weight
  Stats: Mean=-0.0000, Std=0.0623, Min=-1.0391, Max=1.1094
--------------------
Layer: model.layers.23.input_layernorm.weight
  Stats: Mean=10.5000, Std=5.4688, Min=0.0001, Max=74.5000
--------------------
Layer: model.layers.23.post_attention_layernorm.weight
  Stats: Mean=2.0000, Std=0.3125, Min=-0.0005, Max=5.1875

hlnchen

Nov 10, 2025

Hi Muzel, sorry for the unclear context - could you replicate the undesired behavior of the model/suspicious weights when loading the model in transformers 4.47.1 and do inference? I am not an expert of MLX and not sure what happens in your environment.

Muzel

Nov 10, 2025

No, I cannot, I do not have the capacity to rewrite the whole framework just to test that - but thanks for helping anyway!

Muzel changed discussion status to closed Nov 12, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment