cono3-mini

Compact. Capable. Code-native.

cono3-mini is a 9B-parameter language model purpose-built for autonomous software engineering. Starting from the Qwen3.5-9B foundation, we applied supervised fine-tuning with LoRA across a large-scale corpus of real coding agent sessions — teaching the model not just what to write, but how to navigate codebases, recover from mistakes, and work iteratively like a human developer.

License Base Model


Why cono3-mini?

Most code models are trained on static code completion. cono3-mini takes a different approach — it was fine-tuned on a large collection of real-world coding agent sessions where AI systems solved engineering problems end-to-end: reading files, running commands, interpreting errors, editing code, and verifying results.

This gives cono3-mini behaviors you won't find in a typical code LLM:

  • Reads before writing — inspects existing code and project structure before making changes
  • Handles failures gracefully — parses compiler/linter output and self-corrects
  • Applies surgical edits — produces minimal diffs rather than rewriting entire files
  • Reasons step-by-step — uses <think>...</think> blocks to decompose complex tasks
  • Supports 262K tokens — natively handles very large contexts, extensible beyond 1M

The model is fully open under Apache 2.0 with no usage restrictions.


Getting Started

Using Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Cono3/cono3-mini"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype="auto", device_map="auto")

prompt = [
    {"role": "system", "content": "You are an expert software engineer."},
    {"role": "user", "content": "Refactor this function to use async/await instead of callbacks."},
]

text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
inputs = tokenizer([text], return_tensors="pt").to(model.device)
out = model.generate(**inputs, max_new_tokens=2048, temperature=0.6, top_p=0.95, top_k=20)
print(tokenizer.decode(out[0][inputs.input_ids.shape[-1]:], skip_special_tokens=True))

Using vLLM

vllm serve theblackhacker/cono3-mini --tensor-parallel-size 1 --max-model-len 65536
from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="token")
resp = client.chat.completions.create(
    model="Cono3/cono3-mini",
    messages=[{"role": "user", "content": "How do I set up a GitHub Actions CI pipeline for a Rust project?"}],
    temperature=0.6,
)
print(resp.choices[0].message.content)

How It Was Trained

cono3-mini was created by fine-tuning Qwen3.5-9B using LoRA (rank 64, alpha 32) on a curated dataset of agentic coding trajectories. The training data captures full coding sessions — not isolated Q&A pairs — from multiple frontier models, operating within real scaffolding environments.

Base Qwen3.5-9B
Technique LoRA SFT (r=64, α=32)
Data Curated agentic coding sessions
Packing Sequence packing
Tooling Axolotl
Precision bf16
Optimizer AdamW, cosine decay

Under the Hood

cono3-mini retains the Qwen3.5 hybrid architecture that alternates Gated Delta Network layers (efficient linear attention) with standard multi-head attention. This design excels at long-range dependencies while keeping memory and compute practical at 9B scale.


Inference Settings

Temperature 0.6
Top-P 0.95
Top-K 20
Presence Penalty 0.0

Tip: For tool-use or agentic workflows, drop temperature to 0.2–0.4 for more predictable outputs.

Known Limitations

  • Primarily evaluated on English-language tasks; multilingual performance may vary.
  • Best results come from prompting patterns similar to the agent scaffolding used in training data.

Credits

Built on top of Qwen3.5-9B by the Qwen team. Training powered by Axolotl.

Citation

@misc{cono3mini2026,
  title   = {cono3-mini: Autonomous Coding Agent Built on Qwen3.5-9B},
  author  = {Cono3},
  year    = {2026},
  url     = {https://huggingface.co/theblackhacker/cono3-mini}
}
Downloads last month
47
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for theblackhacker/cono3-mini

Finetuned
Qwen/Qwen3.5-9B
Quantized
(109)
this model