|
|
--- |
|
|
title: Docker Model Runner |
|
|
emoji: π³ |
|
|
colorFrom: blue |
|
|
colorTo: purple |
|
|
sdk: docker |
|
|
app_port: 7860 |
|
|
suggested_hardware: cpu-basic |
|
|
pinned: false |
|
|
--- |
|
|
|
|
|
# Docker Model Runner |
|
|
|
|
|
**Anthropic API Compatible** with **Interleaved Thinking** support. |
|
|
|
|
|
## Hardware |
|
|
- **CPU Basic**: 2 vCPU Β· 16 GB RAM |
|
|
|
|
|
## Quick Start |
|
|
|
|
|
```bash |
|
|
pip install anthropic |
|
|
export ANTHROPIC_BASE_URL=https://likhonsheikhdev-docker-model-runner.hf.space |
|
|
export ANTHROPIC_API_KEY=any-key |
|
|
``` |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic() |
|
|
|
|
|
message = client.messages.create( |
|
|
model="MiniMax-M2", |
|
|
max_tokens=1000, |
|
|
system="You are a helpful assistant.", |
|
|
messages=[{"role": "user", "content": "Hi, how are you?"}] |
|
|
) |
|
|
|
|
|
for block in message.content: |
|
|
if block.type == "thinking": |
|
|
print(f"Thinking:\n{block.thinking}\n") |
|
|
elif block.type == "text": |
|
|
print(f"Text:\n{block.text}\n") |
|
|
``` |
|
|
|
|
|
## Interleaved Thinking |
|
|
|
|
|
Enable thinking to get reasoning steps interleaved with responses: |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
base_url="https://likhonsheikhdev-docker-model-runner.hf.space" |
|
|
) |
|
|
|
|
|
message = client.messages.create( |
|
|
model="MiniMax-M2", |
|
|
max_tokens=1024, |
|
|
thinking={ |
|
|
"type": "enabled", |
|
|
"budget_tokens": 200 |
|
|
}, |
|
|
messages=[{"role": "user", "content": "Explain quantum computing"}] |
|
|
) |
|
|
|
|
|
# Response contains interleaved thinking and text blocks |
|
|
for block in message.content: |
|
|
if block.type == "thinking": |
|
|
print(f"π Thinking: {block.thinking}") |
|
|
elif block.type == "text": |
|
|
print(f"π Response: {block.text}") |
|
|
``` |
|
|
|
|
|
## Streaming with Thinking |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
base_url="https://likhonsheikhdev-docker-model-runner.hf.space" |
|
|
) |
|
|
|
|
|
with client.messages.stream( |
|
|
model="MiniMax-M2", |
|
|
max_tokens=1024, |
|
|
thinking={"type": "enabled", "budget_tokens": 100}, |
|
|
messages=[{"role": "user", "content": "Hello!"}] |
|
|
) as stream: |
|
|
for event in stream: |
|
|
if hasattr(event, 'type'): |
|
|
if event.type == 'content_block_start': |
|
|
print(f"\n[{event.content_block.type}]", end=" ") |
|
|
elif event.type == 'content_block_delta': |
|
|
if hasattr(event.delta, 'thinking'): |
|
|
print(event.delta.thinking, end="") |
|
|
elif hasattr(event.delta, 'text'): |
|
|
print(event.delta.text, end="") |
|
|
``` |
|
|
|
|
|
## Multi-Turn with Thinking History |
|
|
|
|
|
**Important**: In multi-turn conversations, append the complete model response (including thinking blocks) to maintain reasoning chain continuity. |
|
|
|
|
|
```python |
|
|
import anthropic |
|
|
|
|
|
client = anthropic.Anthropic( |
|
|
base_url="https://likhonsheikhdev-docker-model-runner.hf.space" |
|
|
) |
|
|
|
|
|
messages = [{"role": "user", "content": "What is 2+2?"}] |
|
|
|
|
|
# First turn |
|
|
response = client.messages.create( |
|
|
model="MiniMax-M2", |
|
|
max_tokens=1024, |
|
|
thinking={"type": "enabled", "budget_tokens": 100}, |
|
|
messages=messages |
|
|
) |
|
|
|
|
|
# Append full response (including thinking) to history |
|
|
messages.append({ |
|
|
"role": "assistant", |
|
|
"content": response.content # Includes both thinking and text blocks |
|
|
}) |
|
|
|
|
|
# Second turn |
|
|
messages.append({"role": "user", "content": "Now multiply that by 3"}) |
|
|
|
|
|
response2 = client.messages.create( |
|
|
model="MiniMax-M2", |
|
|
max_tokens=1024, |
|
|
thinking={"type": "enabled", "budget_tokens": 100}, |
|
|
messages=messages |
|
|
) |
|
|
``` |
|
|
|
|
|
## Supported Models |
|
|
|
|
|
| Model | Description | |
|
|
|-------|-------------| |
|
|
| MiniMax-M2 | Agentic capabilities, Advanced reasoning | |
|
|
| MiniMax-M2-Stable | High concurrency and commercial use | |
|
|
|
|
|
## API Compatibility |
|
|
|
|
|
### Parameters |
|
|
|
|
|
| Parameter | Status | |
|
|
|-----------|--------| |
|
|
| model | β
Fully supported | |
|
|
| messages | β
Partial (text, tool calls) | |
|
|
| max_tokens | β
Fully supported | |
|
|
| stream | β
Fully supported | |
|
|
| system | β
Fully supported | |
|
|
| temperature | β
Range (0.0, 1.0] | |
|
|
| thinking | β
Fully supported | |
|
|
| thinking.budget_tokens | β
Fully supported | |
|
|
| tools | β
Fully supported | |
|
|
| tool_choice | β
Fully supported | |
|
|
| top_p | β
Fully supported | |
|
|
| metadata | β
Fully supported | |
|
|
| top_k | βͺ Ignored | |
|
|
| stop_sequences | βͺ Ignored | |
|
|
|
|
|
### Message Types |
|
|
|
|
|
| Type | Status | |
|
|
|------|--------| |
|
|
| text | β
Supported | |
|
|
| thinking | β
Supported | |
|
|
| tool_use | β
Supported | |
|
|
| tool_result | β
Supported | |
|
|
| image | β Not supported | |
|
|
| document | β Not supported | |
|
|
|
|
|
## Endpoints |
|
|
|
|
|
| Endpoint | Method | Description | |
|
|
|----------|--------|-------------| |
|
|
| `/v1/messages` | POST | Anthropic Messages API | |
|
|
| `/v1/chat/completions` | POST | OpenAI Chat API | |
|
|
| `/v1/models` | GET | List models | |
|
|
| `/health` | GET | Health check | |
|
|
| `/info` | GET | API info | |
|
|
|
|
|
## cURL Example |
|
|
|
|
|
```bash |
|
|
curl -X POST https://likhonsheikhdev-docker-model-runner.hf.space/v1/messages \ |
|
|
-H "Content-Type: application/json" \ |
|
|
-H "x-api-key: any-key" \ |
|
|
-d '{ |
|
|
"model": "MiniMax-M2", |
|
|
"max_tokens": 1024, |
|
|
"thinking": {"type": "enabled", "budget_tokens": 100}, |
|
|
"messages": [ |
|
|
{"role": "user", "content": "Explain AI briefly"} |
|
|
] |
|
|
}' |
|
|
``` |
|
|
|