ratandeep commited on
Commit
a3664fa
Β·
1 Parent(s): f6337fa

Deploy Gradio AI video analysis app

Browse files
Files changed (3) hide show
  1. README.md +92 -7
  2. gradio_ai_enhanced.py +578 -0
  3. requirements.txt +11 -0
README.md CHANGED
@@ -1,12 +1,97 @@
1
  ---
2
- title: Ai Video Analysis
3
- emoji: πŸ’»
4
- colorFrom: gray
5
- colorTo: yellow
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
- app_file: app.py
9
  pinned: false
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: AI Video Analysis
3
+ emoji: πŸŽ₯
4
+ colorFrom: blue
5
+ colorTo: green
6
  sdk: gradio
7
+ sdk_version: 5.0.0
8
+ app_file: gradio_ai_enhanced.py
9
  pinned: false
10
+ license: mit
11
  ---
12
 
13
+ # πŸŽ₯ AI-Enhanced Video Analysis
14
+
15
+ Real-time object detection from your webcam with AI-powered query capabilities using GPT-4o-mini and vector search.
16
+
17
+ ## πŸš€ Features
18
+
19
+ - **Live Object Detection**: YOLOv8 analyzes your webcam feed in real-time
20
+ - **Color Recognition**: Identifies object colors (red, blue, green, etc.)
21
+ - **AI Queries**: Ask questions about what appeared in the video
22
+ - **Vector Search**: Semantic search through video history using ChromaDB
23
+ - **Frame Chunking**: Automatic grouping of video events for efficient storage
24
+
25
+ ## 🎯 How to Use
26
+
27
+ 1. **Enter your OpenAI API key** in the text box and click "Connect"
28
+ - Get a key from: https://platform.openai.com/api-keys
29
+ - Alternatively, the Space admin can set it as a repository secret
30
+
31
+ 2. **Click the webcam button** to start video streaming
32
+ - Allow camera permissions when prompted
33
+ - Wait a few seconds for YOLO model to load (first time only)
34
+
35
+ 3. **Watch objects being detected** in real-time with bounding boxes and labels
36
+
37
+ 4. **Ask questions** about the video:
38
+ - "What objects have appeared in the last minute?"
39
+ - "When did you see a red object?"
40
+ - "How many different objects were detected?"
41
+
42
+ ## πŸ”§ Technical Stack
43
+
44
+ - **YOLOv8**: Real-time object detection
45
+ - **Gradio WebRTC**: Smooth video streaming with Cloudflare TURN servers
46
+ - **OpenAI GPT-4o-mini**: Natural language query understanding
47
+ - **OpenAI Embeddings**: Semantic search capabilities
48
+ - **ChromaDB**: Vector database for storing video events
49
+
50
+ ## πŸ’° Costs
51
+
52
+ - **Hugging Face Spaces**: Free (this Space)
53
+ - **Cloudflare TURN Servers**: Free 10GB/month via Gradio FastRTC
54
+ - **OpenAI API**: Pay-as-you-go
55
+ - Embeddings: ~$0.0001 per chunk
56
+ - GPT-4o-mini: ~$0.0001 per query
57
+ - Typical usage: <$1/month for moderate use
58
+
59
+ ## πŸ› οΈ Local Development
60
+
61
+ ```bash
62
+ # Clone the repo
63
+ git clone https://github.com/ratandeepbansal/yolo2.git
64
+ cd yolo2
65
+
66
+ # Install dependencies
67
+ pip install -r requirements_gradio.txt
68
+
69
+ # Set up API key
70
+ cp .env.example .env
71
+ # Edit .env and add your OpenAI API key
72
+
73
+ # Run the app
74
+ python gradio_ai_enhanced.py
75
+ ```
76
+
77
+ ## πŸ“ Notes
78
+
79
+ - First load takes ~30-60 seconds to download YOLOv8n model (~6MB)
80
+ - WebRTC works best in Chrome/Edge browsers
81
+ - Camera permissions required for webcam access
82
+ - HTTPS required (automatically provided by HF Spaces)
83
+
84
+ ## 🀝 Contributing
85
+
86
+ This is an open-source project. Feel free to:
87
+ - Report issues
88
+ - Suggest features
89
+ - Submit pull requests
90
+
91
+ ## πŸ“„ License
92
+
93
+ MIT License - see LICENSE file for details
94
+
95
+ ---
96
+
97
+ Built with ❀️ using Gradio and YOLOv8
gradio_ai_enhanced.py ADDED
@@ -0,0 +1,578 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ AI-Enhanced Video Analysis with Gradio Live Video
3
+ Features: Real-time YOLO detection, GPT queries, Vector DB storage
4
+ Optimized for Hugging Face Spaces deployment
5
+ """
6
+
7
+ import gradio as gr
8
+ import cv2
9
+ import numpy as np
10
+ from collections import deque
11
+ import time
12
+ from datetime import datetime
13
+ import json
14
+ import os
15
+ from threading import Lock
16
+
17
+ # Ensure Ultralytics writes settings/cache inside the project workspace
18
+ ULTRALYTICS_BASE = os.path.join(os.path.dirname(__file__), ".ultralytics")
19
+ os.environ.setdefault("ULTRALYTICS_SETTINGS_DIR", ULTRALYTICS_BASE)
20
+ os.environ.setdefault("ULTRALYTICS_CACHE_DIR", os.path.join(ULTRALYTICS_BASE, "cache"))
21
+ os.makedirs(os.environ["ULTRALYTICS_SETTINGS_DIR"], exist_ok=True)
22
+ os.makedirs(os.environ["ULTRALYTICS_CACHE_DIR"], exist_ok=True)
23
+
24
+ # AI & Vector DB imports
25
+ from openai import OpenAI
26
+ import chromadb
27
+ from chromadb.config import Settings
28
+
29
+ # YOLO import
30
+ try:
31
+ from ultralytics import YOLO
32
+ YOLO_AVAILABLE = True
33
+ except ImportError:
34
+ YOLO_AVAILABLE = False
35
+
36
+ # Global state management
37
+ class VideoAnalysisState:
38
+ def __init__(self):
39
+ self.lock = Lock()
40
+ self.frame_chunks = deque(maxlen=100)
41
+ self.chunk_id = 0
42
+ self.detected_objects = []
43
+ self.pending_chunks = []
44
+ self.event_log = deque(maxlen=50)
45
+ self.openai_client = None
46
+ self.chroma_client = None
47
+ self.video_collection = None
48
+ self.model = None
49
+ self.frames_processed = 0
50
+ self.frames_processed = 0
51
+
52
+ def init_openai(self, api_key):
53
+ """Initialize OpenAI client"""
54
+ if not api_key:
55
+ return False
56
+ try:
57
+ self.openai_client = OpenAI(api_key=api_key)
58
+ # Test the connection
59
+ self.openai_client.models.list()
60
+ return True
61
+ except Exception as e:
62
+ self.event_log.append(f"❌ OpenAI error: {str(e)[:50]}")
63
+ return False
64
+
65
+ def init_vector_db(self):
66
+ """Initialize ChromaDB"""
67
+ try:
68
+ self.chroma_client = chromadb.Client(Settings(
69
+ anonymized_telemetry=False,
70
+ allow_reset=True
71
+ ))
72
+ self.video_collection = self.chroma_client.get_or_create_collection(
73
+ name="video_events",
74
+ metadata={"hnsw:space": "cosine"}
75
+ )
76
+ return True
77
+ except Exception as e:
78
+ self.event_log.append(f"❌ Vector DB error: {str(e)[:50]}")
79
+ return False
80
+
81
+ def init_yolo(self):
82
+ """Initialize YOLO model"""
83
+ if YOLO_AVAILABLE and self.model is None:
84
+ try:
85
+ self.model = YOLO('yolov8n.pt')
86
+ self.event_log.append("βœ“ YOLO model loaded")
87
+ return True
88
+ except Exception as e:
89
+ self.event_log.append(f"❌ YOLO error: {str(e)[:50]}")
90
+ return False
91
+ return self.model is not None
92
+
93
+ # Global state
94
+ state = VideoAnalysisState()
95
+
96
+ def get_dominant_color(image_region):
97
+ """Get dominant color from image region"""
98
+ if image_region.size == 0:
99
+ return "unknown"
100
+
101
+ hsv = cv2.cvtColor(image_region, cv2.COLOR_BGR2HSV)
102
+ h = np.mean(hsv[:, :, 0])
103
+ s = np.mean(hsv[:, :, 1])
104
+ v = np.mean(hsv[:, :, 2])
105
+
106
+ if s < 40:
107
+ if v < 50:
108
+ return "black"
109
+ elif v > 200:
110
+ return "white"
111
+ else:
112
+ return "gray"
113
+
114
+ if h < 10 or h > 160:
115
+ return "red"
116
+ elif h < 25:
117
+ return "orange"
118
+ elif h < 35:
119
+ return "yellow"
120
+ elif h < 85:
121
+ return "green"
122
+ elif h < 125:
123
+ return "blue"
124
+ elif h < 155:
125
+ return "purple"
126
+ else:
127
+ return "pink"
128
+
129
+ def process_frame(frame):
130
+ """Process video frame with YOLO detection"""
131
+ if frame is None:
132
+ return gr.update(value=None, visible=False)
133
+
134
+ if state.model is None:
135
+ return gr.update(value=frame, visible=True)
136
+
137
+ # Convert incoming RGB frame to BGR for OpenCV/YOLO processing
138
+ frame_bgr = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
139
+
140
+ try:
141
+ # Run YOLO detection
142
+ results = state.model(frame_bgr, conf=0.4, verbose=False)
143
+
144
+ detected_objects = []
145
+ events_text = []
146
+
147
+ for r in results:
148
+ boxes = r.boxes
149
+ if boxes is not None:
150
+ for box in boxes:
151
+ x1, y1, x2, y2 = map(int, box.xyxy[0].tolist())
152
+ conf = box.conf[0].item()
153
+ cls = int(box.cls[0].item())
154
+ label = state.model.names[cls]
155
+
156
+ # Get color
157
+ try:
158
+ roi = frame_bgr[y1:y2, x1:x2]
159
+ color = get_dominant_color(roi)
160
+ except:
161
+ color = "unknown"
162
+
163
+ detected_objects.append({
164
+ 'label': label,
165
+ 'color': color,
166
+ 'confidence': conf,
167
+ 'bbox': (x1, y1, x2, y2)
168
+ })
169
+
170
+ events_text.append(f"{color} {label}")
171
+
172
+ # Draw bounding box
173
+ cv2.rectangle(frame_bgr, (x1, y1), (x2, y2), (0, 255, 0), 2)
174
+
175
+ # Draw label
176
+ text = f"{color} {label} {conf:.2f}"
177
+ cv2.putText(frame_bgr, text, (x1, y1-10),
178
+ cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
179
+
180
+ # Update state (thread-safe)
181
+ with state.lock:
182
+ state.detected_objects = detected_objects
183
+ state.frames_processed += 1
184
+
185
+ # Create chunks every 30 frames
186
+ if state.chunk_id % 30 == 0 and events_text:
187
+ chunk_description = f"At {datetime.now().strftime('%H:%M:%S')}: Detected {', '.join(events_text)}"
188
+
189
+ state.frame_chunks.append({
190
+ 'id': state.chunk_id,
191
+ 'timestamp': time.time(),
192
+ 'description': chunk_description,
193
+ 'objects': detected_objects.copy()
194
+ })
195
+
196
+ state.pending_chunks.append({
197
+ 'id': state.chunk_id,
198
+ 'description': chunk_description,
199
+ 'timestamp': time.time(),
200
+ 'object_count': len(detected_objects)
201
+ })
202
+
203
+ state.chunk_id += 1
204
+ chunk_count = len(state.frame_chunks)
205
+
206
+ # Add stats overlay
207
+ cv2.putText(frame_bgr, f"Objects: {len(detected_objects)} | Chunks: {chunk_count}",
208
+ (10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
209
+
210
+ # Convert back to RGB for display in Gradio
211
+ frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
212
+ return gr.update(value=frame_rgb, visible=True)
213
+
214
+ except Exception as e:
215
+ state.event_log.append(f"❌ Frame error: {str(e)[:50]}")
216
+ return gr.update(value=frame, visible=True)
217
+
218
+ def get_embedding(text):
219
+ """Get embeddings from OpenAI"""
220
+ if not state.openai_client:
221
+ return None
222
+ try:
223
+ response = state.openai_client.embeddings.create(
224
+ model="text-embedding-3-small",
225
+ input=text
226
+ )
227
+ return response.data[0].embedding
228
+ except Exception as e:
229
+ state.event_log.append(f"❌ Embedding error: {str(e)[:50]}")
230
+ return None
231
+
232
+ def process_pending_chunks():
233
+ """Process chunks waiting to be embedded"""
234
+ with state.lock:
235
+ if not state.pending_chunks or not state.video_collection:
236
+ return 0
237
+ chunks_to_process = state.pending_chunks[:5]
238
+
239
+ processed = 0
240
+ for chunk in chunks_to_process:
241
+ try:
242
+ embedding = get_embedding(chunk['description'])
243
+ if embedding:
244
+ state.video_collection.add(
245
+ documents=[chunk['description']],
246
+ embeddings=[embedding],
247
+ ids=[f"chunk_{chunk['id']}"],
248
+ metadatas=[{
249
+ 'timestamp': chunk['timestamp'],
250
+ 'object_count': chunk['object_count']
251
+ }]
252
+ )
253
+ with state.lock:
254
+ state.pending_chunks.remove(chunk)
255
+ processed += 1
256
+ except Exception as e:
257
+ state.event_log.append(f"❌ Embed error: {str(e)[:30]}")
258
+ break
259
+
260
+ return processed
261
+
262
+ def query_with_ai(question):
263
+ """Answer questions using GPT with vector database context"""
264
+ if not state.openai_client:
265
+ return "⚠️ Please enter your OpenAI API key first."
266
+
267
+ if not question or not question.strip():
268
+ return "⚠️ Please enter a question."
269
+
270
+ try:
271
+ # Process pending chunks
272
+ with state.lock:
273
+ has_pending = len(state.pending_chunks) > 0
274
+
275
+ if has_pending:
276
+ processed = process_pending_chunks()
277
+ if processed > 0:
278
+ state.event_log.append(f"βœ“ Embedded {processed} chunks")
279
+
280
+ # Get context from vector DB
281
+ context_docs = []
282
+ if state.video_collection:
283
+ question_embedding = get_embedding(question)
284
+ if question_embedding:
285
+ results = state.video_collection.query(
286
+ query_embeddings=[question_embedding],
287
+ n_results=5
288
+ )
289
+ if results and results['documents']:
290
+ context_docs = results['documents'][0]
291
+
292
+ context = "\n".join(context_docs) if context_docs else "No video events stored yet."
293
+
294
+ # Get current state
295
+ with state.lock:
296
+ current_objects = state.detected_objects.copy()
297
+ frames_seen = state.frames_processed
298
+
299
+ if current_objects:
300
+ obj_descriptions = [f"{o['color']} {o['label']}" for o in current_objects]
301
+ current_state = f"Currently visible: {', '.join(obj_descriptions)}"
302
+ else:
303
+ if frames_seen > 0:
304
+ current_state = "Video stream active but no objects detected in the latest frame."
305
+ else:
306
+ current_state = "No video frames processed yet."
307
+
308
+ if not context_docs and frames_seen > 0:
309
+ context = "Video stream active, waiting for notable detections to log."
310
+
311
+ # Create prompt
312
+ prompt = f"""You are a video analysis assistant. Answer the question based on the video footage context.
313
+
314
+ Video Event History (from vector database):
315
+ {context}
316
+
317
+ Current Frame:
318
+ {current_state}
319
+
320
+ Question: {question}
321
+
322
+ Provide a concise, helpful answer based on the video data."""
323
+
324
+ # Call GPT
325
+ response = state.openai_client.chat.completions.create(
326
+ model="gpt-4o-mini",
327
+ messages=[
328
+ {"role": "system", "content": "You are a helpful video analysis assistant."},
329
+ {"role": "user", "content": prompt}
330
+ ],
331
+ temperature=0.7,
332
+ max_tokens=200
333
+ )
334
+
335
+ answer = response.choices[0].message.content
336
+ state.event_log.append(f"βœ“ Query answered")
337
+ return f"**AI Answer:**\n\n{answer}"
338
+
339
+ except Exception as e:
340
+ error_msg = f"Error querying AI: {str(e)}"
341
+ state.event_log.append(f"❌ Query error: {str(e)[:30]}")
342
+ return error_msg
343
+
344
+ def setup_api_key(api_key):
345
+ """Setup OpenAI API key and initialize services"""
346
+ if not api_key or not api_key.strip():
347
+ return "⚠️ Please enter a valid API key", get_stats()
348
+
349
+ success = state.init_openai(api_key)
350
+ if success:
351
+ state.init_vector_db()
352
+ state.init_yolo()
353
+ return "βœ… OpenAI connected! Vector DB initialized!", get_stats()
354
+ else:
355
+ return "❌ Failed to connect to OpenAI. Check your API key.", get_stats()
356
+
357
+ def get_stats():
358
+ """Get current system statistics"""
359
+ with state.lock:
360
+ chunks = len(state.frame_chunks)
361
+ objects = len(state.detected_objects)
362
+ pending = len(state.pending_chunks)
363
+
364
+ vector_count = 0
365
+ if state.video_collection:
366
+ try:
367
+ vector_count = state.video_collection.count()
368
+ except:
369
+ vector_count = 0
370
+
371
+ stats = f"""**System Status:**
372
+ - Chunks Stored: {chunks}
373
+ - Current Objects: {objects}
374
+ - Pending Embeddings: {pending}
375
+ - Vector DB Entries: {vector_count}
376
+ """
377
+ return stats
378
+
379
+ def get_current_detections():
380
+ """Get list of currently detected objects"""
381
+ with state.lock:
382
+ current = state.detected_objects.copy()
383
+
384
+ if not current:
385
+ return "No objects detected"
386
+
387
+ output = "**Current Detections:**\n\n"
388
+ for i, obj in enumerate(current):
389
+ output += f"{i+1}. {obj['color']} {obj['label']} ({obj['confidence']:.2f})\n"
390
+
391
+ return output
392
+
393
+ def get_recent_chunks():
394
+ """Get recent video chunks"""
395
+ with state.lock:
396
+ recent = list(state.frame_chunks)[-5:]
397
+
398
+ if not recent:
399
+ return "No chunks yet - start the video!"
400
+
401
+ output = "**Recent Video Chunks:**\n\n"
402
+ for chunk in recent:
403
+ output += f"[{chunk['id']}] {chunk['description']}\n\n"
404
+
405
+ return output
406
+
407
+ def get_event_log():
408
+ """Get recent event log"""
409
+ with state.lock:
410
+ events = list(state.event_log)[-10:]
411
+
412
+ if not events:
413
+ return "No events yet"
414
+
415
+ return "\n".join(events)
416
+
417
+ # Initialize YOLO on startup
418
+ state.init_yolo()
419
+
420
+ # Build Gradio interface
421
+ with gr.Blocks(title="AI Video Analysis", theme=gr.themes.Soft()) as demo:
422
+ gr.Markdown("# πŸŽ₯ AI-Enhanced Video Analysis")
423
+ gr.Markdown("*Real-time object detection with GPT queries and vector database storage*")
424
+
425
+ with gr.Row():
426
+ # Left column - Video and controls
427
+ with gr.Column(scale=2):
428
+ gr.Markdown("## πŸ“Ή Live Video Feed")
429
+
430
+ # API Key setup
431
+ with gr.Row():
432
+ api_key_input = gr.Textbox(
433
+ label="OpenAI API Key",
434
+ type="password",
435
+ placeholder="sk-...",
436
+ scale=3
437
+ )
438
+ setup_btn = gr.Button("Connect", scale=1, variant="primary")
439
+
440
+ api_status = gr.Markdown("⚠️ Enter your OpenAI API key to enable AI features")
441
+
442
+ # Live Video Stream
443
+ if YOLO_AVAILABLE:
444
+ processed_feed = gr.Image(
445
+ label="YOLO Detection Feed",
446
+ interactive=False,
447
+ type="numpy",
448
+ visible=False
449
+ )
450
+ webcam_stream = gr.Image(
451
+ label="Webcam Stream",
452
+ sources=["webcam"],
453
+ streaming=True,
454
+ type="numpy"
455
+ )
456
+ webcam_stream.stream(
457
+ fn=process_frame,
458
+ inputs=webcam_stream,
459
+ outputs=processed_feed
460
+ )
461
+ gr.Markdown("πŸ“Ή Start the webcam to reveal the YOLO view above. Detections update in real-time and frames are chunked every ~1 second!")
462
+ else:
463
+ gr.Markdown("❌ YOLO not available. Install with: `pip install ultralytics`")
464
+
465
+ # Troubleshooting
466
+ with gr.Accordion("⚠️ Connection Troubleshooting", open=False):
467
+ gr.Markdown("""
468
+ **If video doesn't connect:**
469
+
470
+ 1. **Allow camera permissions** in your browser
471
+ 2. **Use HTTPS** - Hugging Face Spaces provides this automatically
472
+ 3. **Try Chrome/Edge** - Best webcam streaming support
473
+ 4. **Wait 30-60 seconds** on first load for YOLO model download
474
+ 5. **Check browser console** for errors (F12)
475
+
476
+ Live streaming uses browser-based webcam APIs; ensure camera access is allowed.
477
+ """)
478
+
479
+ # Right column - AI Query and Stats
480
+ with gr.Column(scale=1):
481
+ gr.Markdown("## πŸ€– AI Query Interface")
482
+
483
+ query_input = gr.Textbox(
484
+ label="Ask about the video",
485
+ placeholder="e.g., What objects appeared in the last 30 seconds?",
486
+ lines=3
487
+ )
488
+ query_btn = gr.Button("πŸ” Ask AI", variant="primary")
489
+ query_output = gr.Markdown("*AI response will appear here*")
490
+
491
+ gr.Markdown("---")
492
+
493
+ # Stats
494
+ stats_display = gr.Markdown(value=get_stats, every=10)
495
+ refresh_btn = gr.Button("πŸ”„ Refresh Stats", size="sm")
496
+
497
+ gr.Markdown("---")
498
+
499
+ # Current detections
500
+ detections_display = gr.Markdown(
501
+ value=get_current_detections,
502
+ every=10
503
+ )
504
+
505
+ gr.Markdown("---")
506
+
507
+ # Recent chunks
508
+ chunks_display = gr.Markdown(
509
+ value=get_recent_chunks,
510
+ every=10
511
+ )
512
+
513
+ gr.Markdown("---")
514
+
515
+ # Event log
516
+ gr.Markdown("### πŸ“ Event Log")
517
+ log_display = gr.Markdown(
518
+ value=get_event_log,
519
+ every=10
520
+ )
521
+
522
+ # How it works
523
+ with gr.Accordion("ℹ️ How This Works", open=False):
524
+ gr.Markdown("""
525
+ ### 🎯 Features:
526
+
527
+ **1. Real-time Object Detection:**
528
+ - YOLOv8 detects objects in your webcam feed
529
+ - Color detection identifies object colors
530
+ - Bounding boxes drawn in real-time
531
+
532
+ **2. Frame Chunking:**
533
+ - Video frames grouped into 1-second chunks (30 frames)
534
+ - Chunks stored in memory (last 100) and vector database
535
+
536
+ **3. Vector Database (ChromaDB):**
537
+ - Semantic embeddings of video events
538
+ - Similarity search across video history
539
+
540
+ **4. OpenAI Integration:**
541
+ - GPT-4o-mini for intelligent query answering
542
+ - text-embedding-3-small for semantic search
543
+ - Context-aware responses based on video history
544
+
545
+ ### πŸ”§ Tech Stack:
546
+ - **YOLOv8**: Real-time object detection
547
+ - **Gradio Live Video**: Smooth webcam streaming
548
+ - **OpenAI GPT**: Natural language understanding
549
+ - **ChromaDB**: Vector similarity search
550
+ - **Hugging Face Spaces**: Free deployment with TURN servers
551
+
552
+ ### πŸ’° Costs:
553
+ - **Hugging Face Spaces**: Free (or $9/month PRO for better resources)
554
+ - **OpenAI API**: Pay-as-you-go (minimal for this use case)
555
+ - **TURN Servers**: Free 10GB/month via Cloudflare FastRTC
556
+ """)
557
+
558
+ # Event handlers
559
+ setup_btn.click(
560
+ fn=setup_api_key,
561
+ inputs=[api_key_input],
562
+ outputs=[api_status, stats_display]
563
+ )
564
+
565
+ query_btn.click(
566
+ fn=query_with_ai,
567
+ inputs=[query_input],
568
+ outputs=[query_output]
569
+ )
570
+
571
+ refresh_btn.click(
572
+ fn=lambda: [get_stats(), get_current_detections(), get_recent_chunks(), get_event_log()],
573
+ outputs=[stats_display, detections_display, chunks_display, log_display]
574
+ )
575
+
576
+ # Launch the app
577
+ if __name__ == "__main__":
578
+ demo.launch()
requirements.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Gradio WebRTC AI Video Analysis Requirements
2
+ # For Hugging Face Spaces deployment
3
+
4
+ gradio>=5.0.0
5
+ opencv-python-headless>=4.8.0
6
+ ultralytics>=8.0.0
7
+ openai>=1.0.0
8
+ chromadb>=0.4.0
9
+ numpy>=1.24.0
10
+ torch>=2.0.0
11
+ torchvision>=0.15.0