docker-model-runner / README.md
likhonsheikhdev's picture
Upload folder using huggingface_hub
7270816 verified
---
title: Docker Model Runner
emoji: 🐳
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
suggested_hardware: cpu-basic
pinned: false
---
# Docker Model Runner
**Anthropic API Compatible** with **Interleaved Thinking** support.
## Hardware
- **CPU Basic**: 2 vCPU Β· 16 GB RAM
## Quick Start
```bash
pip install anthropic
export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space
export ANTHROPIC_API_KEY=any-key
```
```python
import anthropic
client = anthropic.Anthropic()
message = client.messages.create(
model="MiniMax-M2",
max_tokens=1000,
system="You are a helpful assistant.",
messages=[{"role": "user", "content": "Hi, how are you?"}]
)
for block in message.content:
if block.type == "thinking":
print(f"Thinking:\n{block.thinking}\n")
elif block.type == "text":
print(f"Text:\n{block.text}\n")
```
## Interleaved Thinking
Enable thinking to get reasoning steps interleaved with responses:
```python
import anthropic
client = anthropic.Anthropic(
base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)
message = client.messages.create(
model="MiniMax-M2",
max_tokens=1024,
thinking={
"type": "enabled",
"budget_tokens": 200
},
messages=[{"role": "user", "content": "Explain quantum computing"}]
)
# Response contains interleaved thinking and text blocks
for block in message.content:
if block.type == "thinking":
print(f"πŸ’­ Thinking: {block.thinking}")
elif block.type == "text":
print(f"πŸ“ Response: {block.text}")
```
## Streaming with Thinking
```python
import anthropic
client = anthropic.Anthropic(
base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)
with client.messages.stream(
model="MiniMax-M2",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 100},
messages=[{"role": "user", "content": "Hello!"}]
) as stream:
for event in stream:
if hasattr(event, 'type'):
if event.type == 'content_block_start':
print(f"\n[{event.content_block.type}]", end=" ")
elif event.type == 'content_block_delta':
if hasattr(event.delta, 'thinking'):
print(event.delta.thinking, end="")
elif hasattr(event.delta, 'text'):
print(event.delta.text, end="")
```
## Multi-Turn with Thinking History
**Important**: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity.
```python
import anthropic
client = anthropic.Anthropic(
base_url="https://likhonsheikhdev-docker-model-runner.hf.space"
)
messages = [{"role": "user", "content": "What is 2+2?"}]
# First turn
response = client.messages.create(
model="MiniMax-M2",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 100},
messages=messages
)
# Append full response (including thinking) to history
messages.append({
"role": "assistant",
"content": response.content # Includes both thinking and text blocks
})
# Second turn
messages.append({"role": "user", "content": "Now multiply that by 3"})
response2 = client.messages.create(
model="MiniMax-M2",
max_tokens=1024,
thinking={"type": "enabled", "budget_tokens": 100},
messages=messages
)
```
## Supported Models
| Model | Description |
|-------|-------------|
| MiniMax-M2 | Agentic capabilities, Advanced reasoning |
| MiniMax-M2-Stable | High concurrency and commercial use |
## API Compatibility
### Parameters
| Parameter | Status |
|-----------|--------|
| model | βœ… Fully supported |
| messages | βœ… Partial (text, tool calls) |
| max_tokens | βœ… Fully supported |
| stream | βœ… Fully supported |
| system | βœ… Fully supported |
| temperature | βœ… Range (0.0, 1.0] |
| thinking | βœ… Fully supported |
| thinking.budget_tokens | βœ… Fully supported |
| tools | βœ… Fully supported |
| tool_choice | βœ… Fully supported |
| top_p | βœ… Fully supported |
| metadata | βœ… Fully supported |
| top_k | βšͺ Ignored |
| stop_sequences | βšͺ Ignored |
### Message Types
| Type | Status |
|------|--------|
| text | βœ… Supported |
| thinking | βœ… Supported |
| tool_use | βœ… Supported |
| tool_result | βœ… Supported |
| image | ❌ Not supported |
| document | ❌ Not supported |
## Endpoints
| Endpoint | Method | Description |
|----------|--------|-------------|
| `/v1/messages` | POST | Anthropic Messages API |
| `/v1/chat/completions` | POST | OpenAI Chat API |
| `/v1/models` | GET | List models |
| `/health` | GET | Health check |
| `/info` | GET | API info |
## cURL Example
```bash
curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \
-H "Content-Type: application/json" \
-H "x-api-key: any-key" \
-d '{
"model": "MiniMax-M2",
"max_tokens": 1024,
"thinking": {"type": "enabled", "budget_tokens": 100},
"messages": [
{"role": "user", "content": "Explain AI briefly"}
]
}'
```