cono3-mini

Compact. Capable. Code-native.

cono3-mini is a 9B-parameter language model purpose-built for autonomous software engineering. Starting from the Qwen3.5-9B foundation, we applied supervised fine-tuning with LoRA across a large-scale corpus of real coding agent sessions — teaching the model not just what to write, but how to navigate codebases, recover from mistakes, and work iteratively like a human developer.

Why cono3-mini?

Most code models are trained on static code completion. cono3-mini takes a different approach — it was fine-tuned on a large collection of real-world coding agent sessions where AI systems solved engineering problems end-to-end: reading files, running commands, interpreting errors, editing code, and verifying results.

This gives cono3-mini behaviors you won't find in a typical code LLM:

Reads before writing — inspects existing code and project structure before making changes
Handles failures gracefully — parses compiler/linter output and self-corrects
Applies surgical edits — produces minimal diffs rather than rewriting entire files
Reasons step-by-step — uses <think>...</think> blocks to decompose complex tasks
Supports 262K tokens — natively handles very large contexts, extensible beyond 1M

The model is fully open under Apache 2.0 with no usage restrictions.

Getting Started

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Cono3/cono3-mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

prompt = [
    {"role": "system", "content": "You are an expert software engineer."},
    {"role": "user", "content": "Refactor this function to use async/await instead of callbacks."},
]

text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Using vLLM

vllm serve theblackhacker/cono3-mini --tensor-parallel-size 1 --max-model-len 65536

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
resp = client.chat.completions.create(
    model="Cono3/cono3-mini",
    messages=[{"role": "user", "content": "How do I set up a GitHub Actions CI pipeline for a Rust project?"}],
    temperature=0.6,
)
print(resp.choices[0].message.content)

How It Was Trained

cono3-mini was created by fine-tuning Qwen3.5-9B using LoRA (rank 64, alpha 32) on a curated dataset of agentic coding trajectories. The training data captures full coding sessions — not isolated Q&A pairs — from multiple frontier models, operating within real scaffolding environments.


Base	Qwen3.5-9B
Technique	LoRA SFT (r=64, α=32)
Data	Curated agentic coding sessions
Packing	Sequence packing
Tooling	Axolotl
Precision	bf16
Optimizer	AdamW, cosine decay

Under the Hood

cono3-mini retains the Qwen3.5 hybrid architecture that alternates Gated Delta Network layers (efficient linear attention) with standard multi-head attention. This design excels at long-range dependencies while keeping memory and compute practical at 9B scale.

Inference Settings


Temperature	0.6
Top-P	0.95
Top-K	20
Presence Penalty	0.0

Tip: For tool-use or agentic workflows, drop temperature to 0.2–0.4 for more predictable outputs.

Known Limitations

Primarily evaluated on English-language tasks; multilingual performance may vary.
Best results come from prompting patterns similar to the agent scaffolding used in training data.

Credits

Built on top of Qwen3.5-9B by the Qwen team. Training powered by Axolotl.

Citation

@misc{cono3mini2026,
  title   = {cono3-mini: Autonomous Coding Agent Built on Qwen3.5-9B},
  author  = {Cono3},
  year    = {2026},
  url     = {https://huggingface.co/theblackhacker/cono3-mini}
}

Downloads last month: 47

GGUF

Model size

9B params

Architecture

qwen35

Hardware compatibility

4-bit

8-bit

Model tree for theblackhacker/cono3-mini

Base model

Qwen/Qwen3.5-9B-Base

Finetuned

Qwen/Qwen3.5-9B

Quantized

(109)

this model