Instructions to use moelanoby/phi-3-M3-coder with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use moelanoby/phi-3-M3-coder with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="moelanoby/phi-3-M3-coder", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForMultimodalLM

tokenizer = AutoTokenizer.from_pretrained("moelanoby/phi-3-M3-coder", trust_remote_code=True)
model = AutoModelForMultimodalLM.from_pretrained("moelanoby/phi-3-M3-coder", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use moelanoby/phi-3-M3-coder with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "moelanoby/phi-3-M3-coder"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moelanoby/phi-3-M3-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/moelanoby/phi-3-M3-coder

SGLang

How to use moelanoby/phi-3-M3-coder with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "moelanoby/phi-3-M3-coder" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moelanoby/phi-3-M3-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "moelanoby/phi-3-M3-coder" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "moelanoby/phi-3-M3-coder",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use moelanoby/phi-3-M3-coder with Docker Model Runner:
```
docker model run hf.co/moelanoby/phi-3-M3-coder
```

🚩 Report: Fake & Spam

by Daemontatox - opened Jul 3, 2025

Discussion

Daemontatox

Jul 3, 2025

False claims , fake numbers and benchmark, just fishing for likes

Slouchy947887497

Jul 3, 2025

Agreed

ehartford

Jul 3, 2025

It's not "spam".

read the code.

that his evaluation methods are not convincing / complete, doesn't render the model meritless.

Daemontatox

Jul 3, 2025

I downloaded the model and tried it for myself before actually submitting this , and its no where near what it claims .

Comparing 4b model to sota models and claiming to out perform them by a mile is just too far fetched.

It falls under the "omg this small model fine-tuned is better than sota models " hype spam.

Slouchy947887497

Jul 3, 2025

It's not "spam".

read the code.

that his evaluation methods are not convincing / complete, doesn't render the model meritless.

No it's spam because it's same model. I'm not even sure there was much tuning. Performs on simple question exactly the same as phi3-mini

win10

Jul 4, 2025

It's not "spam".

read the code.

that his evaluation methods are not convincing / complete, doesn't render the model meritless.

No it's spam because it's same model. I'm not even sure there was much tuning. Performs on simple question exactly the same as phi3-mini

I am currently trying to implement the LLAMA version, which is an improvement on the architecture, and I hope my implementation is correct.This is not just a fine-tuning, it is an architectural improvement, very interesting~

win10

Jul 4, 2025

•

edited Jul 4, 2025

It's not "spam".

read the code.

that his evaluation methods are not convincing / complete, doesn't render the model meritless.

No it's spam because it's same model. I'm not even sure there was much tuning. Performs on simple question exactly the same as phi3-mini

Perhaps freezing the original weights, and training the added components.
Although I have said so much, there are always some ignorant and unmotivated people who just deny it.

Daemontatox changed discussion title from 🚩 Report: Spam to 🚩 Report: Fake Claims Jul 4, 2025

Daemontatox changed discussion title from 🚩 Report: Fake Claims to 🚩 Report: Fake & Spam Jul 4, 2025

Daemontatox changed discussion status to closed Jul 6, 2025

Daemontatox changed discussion status to open Jul 6, 2025

ehartford

Jul 6, 2025

Daemontatox changed discussion status to closed Dec 8, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment