Instructions to use LocoreMind/LocoOperator-4B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use LocoreMind/LocoOperator-4B-GGUF with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="LocoreMind/LocoOperator-4B-GGUF",
	filename="LocoOperator-4B.IQ4_XS.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use LocoreMind/LocoOperator-4B-GGUF with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Use Docker

docker model run hf.co/LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

LM Studio
Jan
Ollama
How to use LocoreMind/LocoOperator-4B-GGUF with Ollama:
```
ollama run hf.co/LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
```

Unsloth Studio new

How to use LocoreMind/LocoOperator-4B-GGUF with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LocoreMind/LocoOperator-4B-GGUF to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for LocoreMind/LocoOperator-4B-GGUF to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for LocoreMind/LocoOperator-4B-GGUF to start chatting

Pi new

How to use LocoreMind/LocoOperator-4B-GGUF with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "LocoreMind/LocoOperator-4B-GGUF:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use LocoreMind/LocoOperator-4B-GGUF with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use LocoreMind/LocoOperator-4B-GGUF with Docker Model Runner:
```
docker model run hf.co/LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
```

Lemonade

How to use LocoreMind/LocoOperator-4B-GGUF with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull LocoreMind/LocoOperator-4B-GGUF:Q4_K_M

Run and chat with the model

lemonade run user.LocoOperator-4B-GGUF-Q4_K_M

List all available models

lemonade list

Works great on Claude Code x llama-server!

by bukit - opened Apr 12

Discussion

bukit

Apr 12

•

edited Apr 12

TLDR: Q8_0 pass all tests, without any hiccups, really underrated model on huggingface, it beats qwen3.5 9b and gemma4 E4B for agentic ai coding!

Wishlist: LocoOperator based on Qwen3.5 9B or Gemma4..

Great work LocoreMind!

Here are tests 6 through 10. These tests push beyond basic file editing and test the advanced reasoning, external environment interaction, and multi-step logic of your local 9B model.

If your model can pass these, it is performing at the level of models 3x to 8x its size.

Test 6: The External Dependency Test

Goal: See if the model can write code that requires third-party packages, realize it needs to install them, and execute the installation command in your terminal. Small models often hallucinate that packages are already installed or panic when an ImportError occurs.

Your Setup:
Ensure your terminal is in your test folder. (If you are using Python in a virtual environment, activate it first so it doesn't install globally).

Prompt to the Agent:

"Write a Python script called fetch_cat.py that uses the external requests library to fetch a random cat fact from https://catfact.ninja/fact and prints it. If the requests library is not installed, use pip to install it. Then run the script."

Pass condition:

It writes the script.
It either proactively runs pip install requests OR it runs the script, gets a ModuleNotFoundError, reads the error, runs pip install, and retries.
It successfully outputs a cat fact.

Test 7: Data Parsing & Mutation (JSON)

Goal: Test if the model can read a structured data file, understand its schema, write code to mutate it, and output a new file without hallucinating or losing data.

Your Setup:
Create a file named users.json and paste this exactly:

[
  {"name": "Alice", "age": 25, "status": "active"},
  {"name": "Bob", "age": 17, "status": "active"},
  {"name": "Charlie", "age": 30, "status": "inactive"},
  {"name": "Diana", "age": 22, "status": "active"}
]

Prompt to the Agent:

"Read users.json. Write a script called filter.py that loads this data, removes any user who is under 18 OR who has an 'inactive' status, and saves the remaining users to a new file called valid_users.json. Run the script, and then read valid_users.json to prove it worked."

Pass condition: It correctly writes the script, runs it, and reads the output file. The final valid_users.json should only contain Alice and Diana. (9B models often mess up the boolean logic: under 18 OR inactive).

Test 8: Nested File Architecture

Goal: Test if the agent can use OS-level tools to create directories (folders) and manage relative paths. Small models often create everything in the root folder because they struggle with mkdir commands.

Your Setup:
An empty folder.

Prompt to the Agent:

"Scaffold a basic web project. Create a folder called public. Inside public, create an index.html file. Also inside public, create two more folders: css and js. Create styles.css in the css folder, and app.js in the js folder. Finally, make sure the index.html file links to both the css and js files using correct relative paths."

Pass condition: The model uses mkdir (or a python script with os.makedirs) to create the nested folders. If you open public/index.html, it must have <link rel="stylesheet" href="css/styles.css"> and <script src="js/app.js"></script>.

Test 9: Log Analysis & Regex Extraction

Goal: See if the model can parse unstructured text, write precise extraction logic (regex or string matching), and count results.

Your Setup:
Create a file named server.log and paste this:

[INFO] 10:00:01 - Server started successfully.
[WARN] 10:05:22 - Memory usage at 80%
[ERROR] 10:06:01 - Connection timeout from IP 192.168.1.50
[INFO] 10:07:15 - User 'admin' logged in.
[ERROR] 10:10:44 - Database query failed: Syntax error.
[WARN] 10:12:00 - High latency detected.
[ERROR] 10:15:30 - Disk space critically low.

Prompt to the Agent:

"Analyze server.log. Write a shell command or a Python script to extract only the lines that contain '[ERROR]'. Save those lines into a new file called critical_errors.txt. Then, tell me exactly how many errors there were."

Pass condition: The model writes a script or uses grep '[ERROR]' server.log > critical_errors.txt. It must report back that there are exactly 3 errors. (Small models often hallucinate the count or include the WARN lines by mistake).

Test 10: Multi-File Bug Tracing (The Final Boss)

Goal: This tests deep context window retention. The model has to trace a stack trace across three different files, find the root cause (which is in a different file than where the crash happens), and fix it.

Your Setup:
Create three files exactly as written below.

File 1: config.py

# The bug is here: max_retries should be an integer, not a string
SETTINGS = {
    "max_retries": "3",
    "timeout": 10
}

File 2: processor.py

from config import SETTINGS

def process_data(data):
    retries = SETTINGS["max_retries"]
    # It will crash here when it tries to add an int to a string
    target_attempts = retries + 1 
    return f"Processing {data} with {target_attempts} attempts allowed."

File 3: main.py

from processor import process_data

if __name__ == "__main__":
    print("Starting application...")
    result = process_data("Test Payload")
    print(result)

Prompt to the Agent:

"Run main.py. It will crash with a TypeError. Follow the stack trace, read the connected files to find the root cause, fix the bug, and run main.py again until it succeeds."

Pass condition:

It runs main.py and sees the error happens in processor.py.
It looks at processor.py and sees retries comes from config.py.
It opens config.py, changes "3" to 3 (removes the quotes).
It re-runs main.py and succeeds.
(Note: A common failure for small models is to "hack" the fix by changing processor.py to target_attempts = int(retries) + 1. While technically functional, a truly smart agent will fix the root cause in config.py).

How to evaluate 6-10:

If your 9B model completes Test 10, you have an absolute powerhouse of a local setup. Multi-file debugging is currently the benchmark that separates standard open-source models from flagship models like GPT-4o or Claude 3.5 Sonnet.

llama-server --host 0.0.0.0 --port 9099 -ngl 99 -fa on -c 65536 --kv-unified --fit on --cache-type-k q8_0 --cache-type-v q8_0 --jinja --api-key "your-llama-api-key" -m Z:\path\to\LocoOperator-4B.Q8_0.gguf --reasoning auto

FutureMa

LocoreMind org Apr 13

Hi @bukit ,

Thank you so much for the hardcore Agent tests! It's awesome to see the 4B general model nail cross-file debugging and OS interactions.

I noticed your wishlist for a Qwen3.5 9B model. I actually just released a new 9B model, but with a very specific focus: CoPaw-Flash-9B-DataAnalyst-LoRA.

Link: https://huggingface.co/jason1966/CoPaw-Flash-9B-DataAnalyst-LoRA

It is specifically designed as an Agentic Data Analyst. Instead of general software development, it is heavily trained to autonomously load datasets (CSV/Excel/JSON), write Python to perform EDA, generate charts, and summarize insights.

It averages 26 continuous, autonomous iterations to complete a full data pipeline with zero human intervention.

If you ever need an agent to crunch data or want to test an autonomous data workflow, I’d love for you to give this 9B analyst a spin! Thanks again for the huge support!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment