tommytracx commited on
Commit
f361dc7
Β·
verified Β·
1 Parent(s): 56b568d

Upload 4 files

Browse files
Files changed (4) hide show
  1. Dockerfile +25 -0
  2. README.md +183 -8
  3. app.py +225 -0
  4. requirements.txt +3 -0
Dockerfile ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ FROM python:3.11-slim
2
+
3
+ WORKDIR /app
4
+
5
+ # Install system dependencies
6
+ RUN apt-get update && apt-get install -y \
7
+ curl \
8
+ && rm -rf /var/lib/apt/lists/*
9
+
10
+ # Copy requirements and install Python dependencies
11
+ COPY requirements.txt .
12
+ RUN pip install --no-cache-dir -r requirements.txt
13
+
14
+ # Copy application code
15
+ COPY . .
16
+
17
+ # Expose port
18
+ EXPOSE 7860
19
+
20
+ # Health check
21
+ HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
22
+ CMD curl -f http://localhost:7860/health || exit 1
23
+
24
+ # Run the application
25
+ CMD ["gunicorn", "--bind", "0.0.0.0:7860", "--workers", "1", "--timeout", "120", "app:app"]
README.md CHANGED
@@ -1,12 +1,187 @@
1
  ---
2
- title: Ollama Api
3
- emoji: πŸŒ–
4
- colorFrom: gray
5
- colorTo: indigo
6
  sdk: docker
7
- pinned: false
8
- license: apache-2.0
9
- short_description: ollama-api
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: Ollama API Space
3
+ emoji: πŸš€
4
+ colorFrom: blue
5
+ colorTo: purple
6
  sdk: docker
7
+ app_port: 7860
 
 
8
  ---
9
 
10
+ # πŸš€ Ollama API Space
11
+
12
+ A Hugging Face Space that provides a REST API interface for Ollama models, allowing you to run local LLMs through a web API.
13
+
14
+ ## 🌟 Features
15
+
16
+ - **Model Management**: List and pull Ollama models
17
+ - **Text Generation**: Generate text using any available Ollama model
18
+ - **REST API**: Simple HTTP endpoints for easy integration
19
+ - **Health Monitoring**: Built-in health checks and status monitoring
20
+ - **OpenWebUI Integration**: Compatible with OpenWebUI for a full chat interface
21
+
22
+ ## πŸš€ Quick Start
23
+
24
+ ### 1. Deploy to Hugging Face Spaces
25
+
26
+ 1. Fork this repository or create a new Space
27
+ 2. Upload these files to your Space
28
+ 3. Set the following environment variables in your Space settings:
29
+ - `OLLAMA_BASE_URL`: URL to your Ollama instance (e.g., `http://localhost:11434`)
30
+ - `ALLOWED_MODELS`: Comma-separated list of allowed models (optional)
31
+
32
+ ### 2. Local Development
33
+
34
+ ```bash
35
+ # Clone the repository
36
+ git clone <your-repo-url>
37
+ cd ollama-space
38
+
39
+ # Install dependencies
40
+ pip install -r requirements.txt
41
+
42
+ # Set environment variables
43
+ export OLLAMA_BASE_URL=http://localhost:11434
44
+
45
+ # Run the application
46
+ python app.py
47
+ ```
48
+
49
+ ## πŸ“‘ API Endpoints
50
+
51
+ ### GET `/api/models`
52
+ List all available Ollama models.
53
+
54
+ **Response:**
55
+ ```json
56
+ {
57
+ "status": "success",
58
+ "models": ["llama2", "codellama", "neural-chat"],
59
+ "count": 3
60
+ }
61
+ ```
62
+
63
+ ### POST `/api/models/pull`
64
+ Pull a model from Ollama.
65
+
66
+ **Request Body:**
67
+ ```json
68
+ {
69
+ "name": "llama2"
70
+ }
71
+ ```
72
+
73
+ **Response:**
74
+ ```json
75
+ {
76
+ "status": "success",
77
+ "model": "llama2"
78
+ }
79
+ ```
80
+
81
+ ### POST `/api/generate`
82
+ Generate text using a model.
83
+
84
+ **Request Body:**
85
+ ```json
86
+ {
87
+ "model": "llama2",
88
+ "prompt": "Hello, how are you?",
89
+ "temperature": 0.7,
90
+ "max_tokens": 100
91
+ }
92
+ ```
93
+
94
+ **Response:**
95
+ ```json
96
+ {
97
+ "status": "success",
98
+ "response": "Hello! I'm doing well, thank you for asking...",
99
+ "model": "llama2",
100
+ "usage": {
101
+ "prompt_tokens": 7,
102
+ "completion_tokens": 15,
103
+ "total_tokens": 22
104
+ }
105
+ }
106
+ ```
107
+
108
+ ### GET `/health`
109
+ Health check endpoint.
110
+
111
+ **Response:**
112
+ ```json
113
+ {
114
+ "status": "healthy",
115
+ "ollama_connection": "connected",
116
+ "available_models": 3
117
+ }
118
+ ```
119
+
120
+ ## πŸ”§ Configuration
121
+
122
+ ### Environment Variables
123
+
124
+ - `OLLAMA_BASE_URL`: URL to your Ollama instance (default: `http://localhost:11434`)
125
+ - `MODELS_DIR`: Directory for storing models (default: `/models`)
126
+ - `ALLOWED_MODELS`: Comma-separated list of allowed models (default: all models)
127
+
128
+ ### Supported Models
129
+
130
+ By default, the following models are allowed:
131
+ - `llama2`
132
+ - `llama2:13b`
133
+ - `llama2:70b`
134
+ - `codellama`
135
+ - `neural-chat`
136
+
137
+ You can customize this list by setting the `ALLOWED_MODELS` environment variable.
138
+
139
+ ## 🌐 Integration with OpenWebUI
140
+
141
+ This Space is designed to work seamlessly with OpenWebUI. You can:
142
+
143
+ 1. Use this Space as a backend API for OpenWebUI
144
+ 2. Configure OpenWebUI to connect to this Space's endpoints
145
+ 3. Enjoy a full chat interface with your local Ollama models
146
+
147
+ ## 🐳 Docker Support
148
+
149
+ The Space includes a Dockerfile for containerized deployment:
150
+
151
+ ```bash
152
+ # Build the image
153
+ docker build -t ollama-space .
154
+
155
+ # Run the container
156
+ docker run -p 7860:7860 -e OLLAMA_BASE_URL=http://host.docker.internal:11434 ollama-space
157
+ ```
158
+
159
+ ## πŸ”’ Security Considerations
160
+
161
+ - The Space only allows access to models specified in `ALLOWED_MODELS`
162
+ - All API endpoints are publicly accessible (consider adding authentication for production use)
163
+ - The Space connects to your Ollama instance - ensure proper network security
164
+
165
+ ## 🚨 Troubleshooting
166
+
167
+ ### Common Issues
168
+
169
+ 1. **Connection to Ollama failed**: Check if Ollama is running and accessible
170
+ 2. **Model not found**: Ensure the model is available in your Ollama instance
171
+ 3. **Timeout errors**: Large models may take time to load - increase timeout values
172
+
173
+ ### Health Check
174
+
175
+ Use the `/health` endpoint to monitor the Space's status and Ollama connection.
176
+
177
+ ## πŸ“ License
178
+
179
+ This project is open source and available under the MIT License.
180
+
181
+ ## 🀝 Contributing
182
+
183
+ Contributions are welcome! Please feel free to submit a Pull Request.
184
+
185
+ ## πŸ“ž Support
186
+
187
+ If you encounter any issues or have questions, please open an issue on the repository.
app.py ADDED
@@ -0,0 +1,225 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from flask import Flask, request, jsonify
2
+ import os
3
+ import subprocess
4
+ import json
5
+ import logging
6
+ from typing import Dict, Any, List
7
+ import requests
8
+
9
+ app = Flask(__name__)
10
+ logging.basicConfig(level=logging.INFO)
11
+
12
+ # Configuration
13
+ OLLAMA_BASE_URL = os.getenv('OLLAMA_BASE_URL', 'http://localhost:11434')
14
+ MODELS_DIR = os.getenv('MODELS_DIR', '/models')
15
+ ALLOWED_MODELS = os.getenv('ALLOWED_MODELS', 'llama2,llama2:13b,llama2:70b,codellama,neural-chat').split(',')
16
+
17
+ class OllamaManager:
18
+ def __init__(self, base_url: str):
19
+ self.base_url = base_url
20
+ self.available_models = []
21
+ self.refresh_models()
22
+
23
+ def refresh_models(self):
24
+ """Refresh the list of available models"""
25
+ try:
26
+ response = requests.get(f"{self.base_url}/api/tags", timeout=10)
27
+ if response.status_code == 200:
28
+ data = response.json()
29
+ self.available_models = [model['name'] for model in data.get('models', [])]
30
+ else:
31
+ self.available_models = []
32
+ except Exception as e:
33
+ logging.error(f"Error refreshing models: {e}")
34
+ self.available_models = []
35
+
36
+ def list_models(self) -> List[str]:
37
+ """List all available models"""
38
+ self.refresh_models()
39
+ return self.available_models
40
+
41
+ def pull_model(self, model_name: str) -> Dict[str, Any]:
42
+ """Pull a model from Ollama"""
43
+ try:
44
+ response = requests.post(f"{self.base_url}/api/pull",
45
+ json={"name": model_name},
46
+ timeout=300)
47
+ if response.status_code == 200:
48
+ return {"status": "success", "model": model_name}
49
+ else:
50
+ return {"status": "error", "message": f"Failed to pull model: {response.text}"}
51
+ except Exception as e:
52
+ return {"status": "error", "message": str(e)}
53
+
54
+ def generate(self, model_name: str, prompt: str, **kwargs) -> Dict[str, Any]:
55
+ """Generate text using a model"""
56
+ try:
57
+ payload = {
58
+ "model": model_name,
59
+ "prompt": prompt,
60
+ "stream": False
61
+ }
62
+ payload.update(kwargs)
63
+
64
+ response = requests.post(f"{self.base_url}/api/generate",
65
+ json=payload,
66
+ timeout=120)
67
+
68
+ if response.status_code == 200:
69
+ data = response.json()
70
+ return {
71
+ "status": "success",
72
+ "response": data.get('response', ''),
73
+ "model": model_name,
74
+ "usage": data.get('usage', {})
75
+ }
76
+ else:
77
+ return {"status": "error", "message": f"Generation failed: {response.text}"}
78
+ except Exception as e:
79
+ return {"status": "error", "message": str(e)}
80
+
81
+ # Initialize Ollama manager
82
+ ollama_manager = OllamaManager(OLLAMA_BASE_URL)
83
+
84
+ @app.route('/')
85
+ def home():
86
+ """Home page with API documentation"""
87
+ return '''
88
+ <!DOCTYPE html>
89
+ <html>
90
+ <head>
91
+ <title>Ollama API Space</title>
92
+ <style>
93
+ body { font-family: Arial, sans-serif; max-width: 800px; margin: 0 auto; padding: 20px; }
94
+ .endpoint { background: #f5f5f5; padding: 15px; margin: 10px 0; border-radius: 5px; }
95
+ .method { background: #007bff; color: white; padding: 2px 8px; border-radius: 3px; font-size: 12px; }
96
+ .url { font-family: monospace; background: #e9ecef; padding: 2px 6px; border-radius: 3px; }
97
+ </style>
98
+ </head>
99
+ <body>
100
+ <h1>πŸš€ Ollama API Space</h1>
101
+ <p>This Space provides API endpoints for Ollama model management and inference.</p>
102
+
103
+ <h2>Available Endpoints</h2>
104
+
105
+ <div class="endpoint">
106
+ <span class="method">GET</span> <span class="url">/api/models</span>
107
+ <p>List all available models</p>
108
+ </div>
109
+
110
+ <div class="endpoint">
111
+ <span class="method">POST</span> <span class="url">/api/models/pull</span>
112
+ <p>Pull a model from Ollama</p>
113
+ <p>Body: {"name": "model_name"}</p>
114
+ </div>
115
+
116
+ <div class="endpoint">
117
+ <span class="method">POST</span> <span class="url">/api/generate</span>
118
+ <p>Generate text using a model</p>
119
+ <p>Body: {"model": "model_name", "prompt": "your prompt"}</p>
120
+ </div>
121
+
122
+ <div class="endpoint">
123
+ <span class="method">GET</span> <span class="url">/health</span>
124
+ <p>Health check endpoint</p>
125
+ </div>
126
+
127
+ <h2>Usage Examples</h2>
128
+ <p>You can use this API with OpenWebUI or any other client that supports REST APIs.</p>
129
+
130
+ <h3>cURL Examples</h3>
131
+ <pre>
132
+ # List models
133
+ curl https://your-space-url.hf.space/api/models
134
+
135
+ # Generate text
136
+ curl -X POST https://your-space-url.hf.space/api/generate \
137
+ -H "Content-Type: application/json" \
138
+ -d '{"model": "llama2", "prompt": "Hello, how are you?"}'
139
+ </pre>
140
+ </body>
141
+ </html>
142
+ '''
143
+
144
+ @app.route('/api/models', methods=['GET'])
145
+ def list_models():
146
+ """List all available models"""
147
+ try:
148
+ models = ollama_manager.list_models()
149
+ return jsonify({
150
+ "status": "success",
151
+ "models": models,
152
+ "count": len(models)
153
+ })
154
+ except Exception as e:
155
+ return jsonify({"status": "error", "message": str(e)}), 500
156
+
157
+ @app.route('/api/models/pull', methods=['POST'])
158
+ def pull_model():
159
+ """Pull a model from Ollama"""
160
+ try:
161
+ data = request.get_json()
162
+ if not data or 'name' not in data:
163
+ return jsonify({"status": "error", "message": "Model name is required"}), 400
164
+
165
+ model_name = data['name']
166
+ if model_name not in ALLOWED_MODELS:
167
+ return jsonify({"status": "error", "message": f"Model {model_name} not in allowed list"}), 400
168
+
169
+ result = ollama_manager.pull_model(model_name)
170
+ if result["status"] == "success":
171
+ return jsonify(result), 200
172
+ else:
173
+ return jsonify(result), 500
174
+ except Exception as e:
175
+ return jsonify({"status": "error", "message": str(e)}), 500
176
+
177
+ @app.route('/api/generate', methods=['POST'])
178
+ def generate_text():
179
+ """Generate text using a model"""
180
+ try:
181
+ data = request.get_json()
182
+ if not data or 'model' not in data or 'prompt' not in data:
183
+ return jsonify({"status": "error", "message": "Model name and prompt are required"}), 400
184
+
185
+ model_name = data['model']
186
+ prompt = data['prompt']
187
+
188
+ # Remove additional parameters that might be passed
189
+ kwargs = {k: v for k, v in data.items() if k not in ['model', 'prompt']}
190
+
191
+ result = ollama_manager.generate(model_name, prompt, **kwargs)
192
+ if result["status"] == "success":
193
+ return jsonify(result), 200
194
+ else:
195
+ return jsonify(result), 500
196
+ except Exception as e:
197
+ return jsonify({"status": "error", "message": str(e)}), 500
198
+
199
+ @app.route('/health', methods=['GET'])
200
+ def health_check():
201
+ """Health check endpoint"""
202
+ try:
203
+ # Try to connect to Ollama
204
+ response = requests.get(f"{OLLAMA_BASE_URL}/api/tags", timeout=5)
205
+ if response.status_code == 200:
206
+ return jsonify({
207
+ "status": "healthy",
208
+ "ollama_connection": "connected",
209
+ "available_models": len(ollama_manager.available_models)
210
+ })
211
+ else:
212
+ return jsonify({
213
+ "status": "unhealthy",
214
+ "ollama_connection": "failed",
215
+ "error": f"Ollama returned status {response.status_code}"
216
+ }), 503
217
+ except Exception as e:
218
+ return jsonify({
219
+ "status": "unhealthy",
220
+ "ollama_connection": "failed",
221
+ "error": str(e)
222
+ }), 503
223
+
224
+ if __name__ == '__main__':
225
+ app.run(host='0.0.0.0', port=7860, debug=False)
requirements.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ flask==2.3.3
2
+ requests==2.31.0
3
+ gunicorn==21.2.0