Instructions to use LocoreMind/LocoOperator-4B-GGUF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use LocoreMind/LocoOperator-4B-GGUF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="LocoreMind/LocoOperator-4B-GGUF", filename="LocoOperator-4B.IQ4_XS.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use LocoreMind/LocoOperator-4B-GGUF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Use Docker
docker model run hf.co/LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use LocoreMind/LocoOperator-4B-GGUF with Ollama:
ollama run hf.co/LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
- Unsloth Studio new
How to use LocoreMind/LocoOperator-4B-GGUF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LocoreMind/LocoOperator-4B-GGUF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for LocoreMind/LocoOperator-4B-GGUF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for LocoreMind/LocoOperator-4B-GGUF to start chatting
- Pi new
How to use LocoreMind/LocoOperator-4B-GGUF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "LocoreMind/LocoOperator-4B-GGUF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use LocoreMind/LocoOperator-4B-GGUF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use LocoreMind/LocoOperator-4B-GGUF with Docker Model Runner:
docker model run hf.co/LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
- Lemonade
How to use LocoreMind/LocoOperator-4B-GGUF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull LocoreMind/LocoOperator-4B-GGUF:Q4_K_M
Run and chat with the model
lemonade run user.LocoOperator-4B-GGUF-Q4_K_M
List all available models
lemonade list
Works great on Claude Code x llama-server!
TLDR: Q8_0 pass all tests, without any hiccups, really underrated model on huggingface, it beats qwen3.5 9b and gemma4 E4B for agentic ai coding!
Wishlist: LocoOperator based on Qwen3.5 9B or Gemma4..
Great work LocoreMind!
Here are tests 6 through 10. These tests push beyond basic file editing and test the advanced reasoning, external environment interaction, and multi-step logic of your local 9B model.
If your model can pass these, it is performing at the level of models 3x to 8x its size.
Test 6: The External Dependency Test
Goal: See if the model can write code that requires third-party packages, realize it needs to install them, and execute the installation command in your terminal. Small models often hallucinate that packages are already installed or panic when an ImportError occurs.
Your Setup:
Ensure your terminal is in your test folder. (If you are using Python in a virtual environment, activate it first so it doesn't install globally).
Prompt to the Agent:
"Write a Python script called
fetch_cat.pythat uses the externalrequestslibrary to fetch a random cat fact fromhttps://catfact.ninja/factand prints it. If therequestslibrary is not installed, use pip to install it. Then run the script."
Pass condition:
- It writes the script.
- It either proactively runs
pip install requestsOR it runs the script, gets aModuleNotFoundError, reads the error, runspip install, and retries. - It successfully outputs a cat fact.
Test 7: Data Parsing & Mutation (JSON)
Goal: Test if the model can read a structured data file, understand its schema, write code to mutate it, and output a new file without hallucinating or losing data.
Your Setup:
Create a file named users.json and paste this exactly:
[
{"name": "Alice", "age": 25, "status": "active"},
{"name": "Bob", "age": 17, "status": "active"},
{"name": "Charlie", "age": 30, "status": "inactive"},
{"name": "Diana", "age": 22, "status": "active"}
]
Prompt to the Agent:
"Read
users.json. Write a script calledfilter.pythat loads this data, removes any user who is under 18 OR who has an 'inactive' status, and saves the remaining users to a new file calledvalid_users.json. Run the script, and then readvalid_users.jsonto prove it worked."
Pass condition: It correctly writes the script, runs it, and reads the output file. The final valid_users.json should only contain Alice and Diana. (9B models often mess up the boolean logic: under 18 OR inactive).
Test 8: Nested File Architecture
Goal: Test if the agent can use OS-level tools to create directories (folders) and manage relative paths. Small models often create everything in the root folder because they struggle with mkdir commands.
Your Setup:
An empty folder.
Prompt to the Agent:
"Scaffold a basic web project. Create a folder called
public. Insidepublic, create anindex.htmlfile. Also insidepublic, create two more folders:cssandjs. Createstyles.cssin thecssfolder, andapp.jsin thejsfolder. Finally, make sure theindex.htmlfile links to both the css and js files using correct relative paths."
Pass condition: The model uses mkdir (or a python script with os.makedirs) to create the nested folders. If you open public/index.html, it must have <link rel="stylesheet" href="css/styles.css"> and <script src="js/app.js"></script>.
Test 9: Log Analysis & Regex Extraction
Goal: See if the model can parse unstructured text, write precise extraction logic (regex or string matching), and count results.
Your Setup:
Create a file named server.log and paste this:
[INFO] 10:00:01 - Server started successfully.
[WARN] 10:05:22 - Memory usage at 80%
[ERROR] 10:06:01 - Connection timeout from IP 192.168.1.50
[INFO] 10:07:15 - User 'admin' logged in.
[ERROR] 10:10:44 - Database query failed: Syntax error.
[WARN] 10:12:00 - High latency detected.
[ERROR] 10:15:30 - Disk space critically low.
Prompt to the Agent:
"Analyze
server.log. Write a shell command or a Python script to extract only the lines that contain '[ERROR]'. Save those lines into a new file calledcritical_errors.txt. Then, tell me exactly how many errors there were."
Pass condition: The model writes a script or uses grep '[ERROR]' server.log > critical_errors.txt. It must report back that there are exactly 3 errors. (Small models often hallucinate the count or include the WARN lines by mistake).
Test 10: Multi-File Bug Tracing (The Final Boss)
Goal: This tests deep context window retention. The model has to trace a stack trace across three different files, find the root cause (which is in a different file than where the crash happens), and fix it.
Your Setup:
Create three files exactly as written below.
File 1: config.py
# The bug is here: max_retries should be an integer, not a string
SETTINGS = {
"max_retries": "3",
"timeout": 10
}
File 2: processor.py
from config import SETTINGS
def process_data(data):
retries = SETTINGS["max_retries"]
# It will crash here when it tries to add an int to a string
target_attempts = retries + 1
return f"Processing {data} with {target_attempts} attempts allowed."
File 3: main.py
from processor import process_data
if __name__ == "__main__":
print("Starting application...")
result = process_data("Test Payload")
print(result)
Prompt to the Agent:
"Run
main.py. It will crash with a TypeError. Follow the stack trace, read the connected files to find the root cause, fix the bug, and runmain.pyagain until it succeeds."
Pass condition:
- It runs
main.pyand sees the error happens inprocessor.py. - It looks at
processor.pyand seesretriescomes fromconfig.py. - It opens
config.py, changes"3"to3(removes the quotes). - It re-runs
main.pyand succeeds.
(Note: A common failure for small models is to "hack" the fix by changingprocessor.pytotarget_attempts = int(retries) + 1. While technically functional, a truly smart agent will fix the root cause inconfig.py).
How to evaluate 6-10:
If your 9B model completes Test 10, you have an absolute powerhouse of a local setup. Multi-file debugging is currently the benchmark that separates standard open-source models from flagship models like GPT-4o or Claude 3.5 Sonnet.
llama-server --host 0.0.0.0 --port 9099 -ngl 99 -fa on -c 65536 --kv-unified --fit on --cache-type-k q8_0 --cache-type-v q8_0 --jinja --api-key "your-llama-api-key" -m Z:\path\to\LocoOperator-4B.Q8_0.gguf --reasoning auto
Hi @bukit ,
Thank you so much for the hardcore Agent tests! It's awesome to see the 4B general model nail cross-file debugging and OS interactions.
I noticed your wishlist for a Qwen3.5 9B model. I actually just released a new 9B model, but with a very specific focus: CoPaw-Flash-9B-DataAnalyst-LoRA.
Link: https://huggingface.co/jason1966/CoPaw-Flash-9B-DataAnalyst-LoRA
It is specifically designed as an Agentic Data Analyst. Instead of general software development, it is heavily trained to autonomously load datasets (CSV/Excel/JSON), write Python to perform EDA, generate charts, and summarize insights.
It averages 26 continuous, autonomous iterations to complete a full data pipeline with zero human intervention.
If you ever need an agent to crunch data or want to test an autonomous data workflow, I’d love for you to give this 9B analyst a spin! Thanks again for the huge support!
