Spaces:
Build error
Build error
Deploy Gradio AI video analysis app
Browse files- README.md +92 -7
- gradio_ai_enhanced.py +578 -0
- requirements.txt +11 -0
README.md
CHANGED
|
@@ -1,12 +1,97 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
sdk: gradio
|
| 7 |
-
sdk_version: 5.
|
| 8 |
-
app_file:
|
| 9 |
pinned: false
|
|
|
|
| 10 |
---
|
| 11 |
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: AI Video Analysis
|
| 3 |
+
emoji: π₯
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: green
|
| 6 |
sdk: gradio
|
| 7 |
+
sdk_version: 5.0.0
|
| 8 |
+
app_file: gradio_ai_enhanced.py
|
| 9 |
pinned: false
|
| 10 |
+
license: mit
|
| 11 |
---
|
| 12 |
|
| 13 |
+
# π₯ AI-Enhanced Video Analysis
|
| 14 |
+
|
| 15 |
+
Real-time object detection from your webcam with AI-powered query capabilities using GPT-4o-mini and vector search.
|
| 16 |
+
|
| 17 |
+
## π Features
|
| 18 |
+
|
| 19 |
+
- **Live Object Detection**: YOLOv8 analyzes your webcam feed in real-time
|
| 20 |
+
- **Color Recognition**: Identifies object colors (red, blue, green, etc.)
|
| 21 |
+
- **AI Queries**: Ask questions about what appeared in the video
|
| 22 |
+
- **Vector Search**: Semantic search through video history using ChromaDB
|
| 23 |
+
- **Frame Chunking**: Automatic grouping of video events for efficient storage
|
| 24 |
+
|
| 25 |
+
## π― How to Use
|
| 26 |
+
|
| 27 |
+
1. **Enter your OpenAI API key** in the text box and click "Connect"
|
| 28 |
+
- Get a key from: https://platform.openai.com/api-keys
|
| 29 |
+
- Alternatively, the Space admin can set it as a repository secret
|
| 30 |
+
|
| 31 |
+
2. **Click the webcam button** to start video streaming
|
| 32 |
+
- Allow camera permissions when prompted
|
| 33 |
+
- Wait a few seconds for YOLO model to load (first time only)
|
| 34 |
+
|
| 35 |
+
3. **Watch objects being detected** in real-time with bounding boxes and labels
|
| 36 |
+
|
| 37 |
+
4. **Ask questions** about the video:
|
| 38 |
+
- "What objects have appeared in the last minute?"
|
| 39 |
+
- "When did you see a red object?"
|
| 40 |
+
- "How many different objects were detected?"
|
| 41 |
+
|
| 42 |
+
## π§ Technical Stack
|
| 43 |
+
|
| 44 |
+
- **YOLOv8**: Real-time object detection
|
| 45 |
+
- **Gradio WebRTC**: Smooth video streaming with Cloudflare TURN servers
|
| 46 |
+
- **OpenAI GPT-4o-mini**: Natural language query understanding
|
| 47 |
+
- **OpenAI Embeddings**: Semantic search capabilities
|
| 48 |
+
- **ChromaDB**: Vector database for storing video events
|
| 49 |
+
|
| 50 |
+
## π° Costs
|
| 51 |
+
|
| 52 |
+
- **Hugging Face Spaces**: Free (this Space)
|
| 53 |
+
- **Cloudflare TURN Servers**: Free 10GB/month via Gradio FastRTC
|
| 54 |
+
- **OpenAI API**: Pay-as-you-go
|
| 55 |
+
- Embeddings: ~$0.0001 per chunk
|
| 56 |
+
- GPT-4o-mini: ~$0.0001 per query
|
| 57 |
+
- Typical usage: <$1/month for moderate use
|
| 58 |
+
|
| 59 |
+
## π οΈ Local Development
|
| 60 |
+
|
| 61 |
+
```bash
|
| 62 |
+
# Clone the repo
|
| 63 |
+
git clone https://github.com/ratandeepbansal/yolo2.git
|
| 64 |
+
cd yolo2
|
| 65 |
+
|
| 66 |
+
# Install dependencies
|
| 67 |
+
pip install -r requirements_gradio.txt
|
| 68 |
+
|
| 69 |
+
# Set up API key
|
| 70 |
+
cp .env.example .env
|
| 71 |
+
# Edit .env and add your OpenAI API key
|
| 72 |
+
|
| 73 |
+
# Run the app
|
| 74 |
+
python gradio_ai_enhanced.py
|
| 75 |
+
```
|
| 76 |
+
|
| 77 |
+
## π Notes
|
| 78 |
+
|
| 79 |
+
- First load takes ~30-60 seconds to download YOLOv8n model (~6MB)
|
| 80 |
+
- WebRTC works best in Chrome/Edge browsers
|
| 81 |
+
- Camera permissions required for webcam access
|
| 82 |
+
- HTTPS required (automatically provided by HF Spaces)
|
| 83 |
+
|
| 84 |
+
## π€ Contributing
|
| 85 |
+
|
| 86 |
+
This is an open-source project. Feel free to:
|
| 87 |
+
- Report issues
|
| 88 |
+
- Suggest features
|
| 89 |
+
- Submit pull requests
|
| 90 |
+
|
| 91 |
+
## π License
|
| 92 |
+
|
| 93 |
+
MIT License - see LICENSE file for details
|
| 94 |
+
|
| 95 |
+
---
|
| 96 |
+
|
| 97 |
+
Built with β€οΈ using Gradio and YOLOv8
|
gradio_ai_enhanced.py
ADDED
|
@@ -0,0 +1,578 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
AI-Enhanced Video Analysis with Gradio Live Video
|
| 3 |
+
Features: Real-time YOLO detection, GPT queries, Vector DB storage
|
| 4 |
+
Optimized for Hugging Face Spaces deployment
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
import gradio as gr
|
| 8 |
+
import cv2
|
| 9 |
+
import numpy as np
|
| 10 |
+
from collections import deque
|
| 11 |
+
import time
|
| 12 |
+
from datetime import datetime
|
| 13 |
+
import json
|
| 14 |
+
import os
|
| 15 |
+
from threading import Lock
|
| 16 |
+
|
| 17 |
+
# Ensure Ultralytics writes settings/cache inside the project workspace
|
| 18 |
+
ULTRALYTICS_BASE = os.path.join(os.path.dirname(__file__), ".ultralytics")
|
| 19 |
+
os.environ.setdefault("ULTRALYTICS_SETTINGS_DIR", ULTRALYTICS_BASE)
|
| 20 |
+
os.environ.setdefault("ULTRALYTICS_CACHE_DIR", os.path.join(ULTRALYTICS_BASE, "cache"))
|
| 21 |
+
os.makedirs(os.environ["ULTRALYTICS_SETTINGS_DIR"], exist_ok=True)
|
| 22 |
+
os.makedirs(os.environ["ULTRALYTICS_CACHE_DIR"], exist_ok=True)
|
| 23 |
+
|
| 24 |
+
# AI & Vector DB imports
|
| 25 |
+
from openai import OpenAI
|
| 26 |
+
import chromadb
|
| 27 |
+
from chromadb.config import Settings
|
| 28 |
+
|
| 29 |
+
# YOLO import
|
| 30 |
+
try:
|
| 31 |
+
from ultralytics import YOLO
|
| 32 |
+
YOLO_AVAILABLE = True
|
| 33 |
+
except ImportError:
|
| 34 |
+
YOLO_AVAILABLE = False
|
| 35 |
+
|
| 36 |
+
# Global state management
|
| 37 |
+
class VideoAnalysisState:
|
| 38 |
+
def __init__(self):
|
| 39 |
+
self.lock = Lock()
|
| 40 |
+
self.frame_chunks = deque(maxlen=100)
|
| 41 |
+
self.chunk_id = 0
|
| 42 |
+
self.detected_objects = []
|
| 43 |
+
self.pending_chunks = []
|
| 44 |
+
self.event_log = deque(maxlen=50)
|
| 45 |
+
self.openai_client = None
|
| 46 |
+
self.chroma_client = None
|
| 47 |
+
self.video_collection = None
|
| 48 |
+
self.model = None
|
| 49 |
+
self.frames_processed = 0
|
| 50 |
+
self.frames_processed = 0
|
| 51 |
+
|
| 52 |
+
def init_openai(self, api_key):
|
| 53 |
+
"""Initialize OpenAI client"""
|
| 54 |
+
if not api_key:
|
| 55 |
+
return False
|
| 56 |
+
try:
|
| 57 |
+
self.openai_client = OpenAI(api_key=api_key)
|
| 58 |
+
# Test the connection
|
| 59 |
+
self.openai_client.models.list()
|
| 60 |
+
return True
|
| 61 |
+
except Exception as e:
|
| 62 |
+
self.event_log.append(f"β OpenAI error: {str(e)[:50]}")
|
| 63 |
+
return False
|
| 64 |
+
|
| 65 |
+
def init_vector_db(self):
|
| 66 |
+
"""Initialize ChromaDB"""
|
| 67 |
+
try:
|
| 68 |
+
self.chroma_client = chromadb.Client(Settings(
|
| 69 |
+
anonymized_telemetry=False,
|
| 70 |
+
allow_reset=True
|
| 71 |
+
))
|
| 72 |
+
self.video_collection = self.chroma_client.get_or_create_collection(
|
| 73 |
+
name="video_events",
|
| 74 |
+
metadata={"hnsw:space": "cosine"}
|
| 75 |
+
)
|
| 76 |
+
return True
|
| 77 |
+
except Exception as e:
|
| 78 |
+
self.event_log.append(f"β Vector DB error: {str(e)[:50]}")
|
| 79 |
+
return False
|
| 80 |
+
|
| 81 |
+
def init_yolo(self):
|
| 82 |
+
"""Initialize YOLO model"""
|
| 83 |
+
if YOLO_AVAILABLE and self.model is None:
|
| 84 |
+
try:
|
| 85 |
+
self.model = YOLO('yolov8n.pt')
|
| 86 |
+
self.event_log.append("β YOLO model loaded")
|
| 87 |
+
return True
|
| 88 |
+
except Exception as e:
|
| 89 |
+
self.event_log.append(f"β YOLO error: {str(e)[:50]}")
|
| 90 |
+
return False
|
| 91 |
+
return self.model is not None
|
| 92 |
+
|
| 93 |
+
# Global state
|
| 94 |
+
state = VideoAnalysisState()
|
| 95 |
+
|
| 96 |
+
def get_dominant_color(image_region):
|
| 97 |
+
"""Get dominant color from image region"""
|
| 98 |
+
if image_region.size == 0:
|
| 99 |
+
return "unknown"
|
| 100 |
+
|
| 101 |
+
hsv = cv2.cvtColor(image_region, cv2.COLOR_BGR2HSV)
|
| 102 |
+
h = np.mean(hsv[:, :, 0])
|
| 103 |
+
s = np.mean(hsv[:, :, 1])
|
| 104 |
+
v = np.mean(hsv[:, :, 2])
|
| 105 |
+
|
| 106 |
+
if s < 40:
|
| 107 |
+
if v < 50:
|
| 108 |
+
return "black"
|
| 109 |
+
elif v > 200:
|
| 110 |
+
return "white"
|
| 111 |
+
else:
|
| 112 |
+
return "gray"
|
| 113 |
+
|
| 114 |
+
if h < 10 or h > 160:
|
| 115 |
+
return "red"
|
| 116 |
+
elif h < 25:
|
| 117 |
+
return "orange"
|
| 118 |
+
elif h < 35:
|
| 119 |
+
return "yellow"
|
| 120 |
+
elif h < 85:
|
| 121 |
+
return "green"
|
| 122 |
+
elif h < 125:
|
| 123 |
+
return "blue"
|
| 124 |
+
elif h < 155:
|
| 125 |
+
return "purple"
|
| 126 |
+
else:
|
| 127 |
+
return "pink"
|
| 128 |
+
|
| 129 |
+
def process_frame(frame):
|
| 130 |
+
"""Process video frame with YOLO detection"""
|
| 131 |
+
if frame is None:
|
| 132 |
+
return gr.update(value=None, visible=False)
|
| 133 |
+
|
| 134 |
+
if state.model is None:
|
| 135 |
+
return gr.update(value=frame, visible=True)
|
| 136 |
+
|
| 137 |
+
# Convert incoming RGB frame to BGR for OpenCV/YOLO processing
|
| 138 |
+
frame_bgr = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
|
| 139 |
+
|
| 140 |
+
try:
|
| 141 |
+
# Run YOLO detection
|
| 142 |
+
results = state.model(frame_bgr, conf=0.4, verbose=False)
|
| 143 |
+
|
| 144 |
+
detected_objects = []
|
| 145 |
+
events_text = []
|
| 146 |
+
|
| 147 |
+
for r in results:
|
| 148 |
+
boxes = r.boxes
|
| 149 |
+
if boxes is not None:
|
| 150 |
+
for box in boxes:
|
| 151 |
+
x1, y1, x2, y2 = map(int, box.xyxy[0].tolist())
|
| 152 |
+
conf = box.conf[0].item()
|
| 153 |
+
cls = int(box.cls[0].item())
|
| 154 |
+
label = state.model.names[cls]
|
| 155 |
+
|
| 156 |
+
# Get color
|
| 157 |
+
try:
|
| 158 |
+
roi = frame_bgr[y1:y2, x1:x2]
|
| 159 |
+
color = get_dominant_color(roi)
|
| 160 |
+
except:
|
| 161 |
+
color = "unknown"
|
| 162 |
+
|
| 163 |
+
detected_objects.append({
|
| 164 |
+
'label': label,
|
| 165 |
+
'color': color,
|
| 166 |
+
'confidence': conf,
|
| 167 |
+
'bbox': (x1, y1, x2, y2)
|
| 168 |
+
})
|
| 169 |
+
|
| 170 |
+
events_text.append(f"{color} {label}")
|
| 171 |
+
|
| 172 |
+
# Draw bounding box
|
| 173 |
+
cv2.rectangle(frame_bgr, (x1, y1), (x2, y2), (0, 255, 0), 2)
|
| 174 |
+
|
| 175 |
+
# Draw label
|
| 176 |
+
text = f"{color} {label} {conf:.2f}"
|
| 177 |
+
cv2.putText(frame_bgr, text, (x1, y1-10),
|
| 178 |
+
cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
|
| 179 |
+
|
| 180 |
+
# Update state (thread-safe)
|
| 181 |
+
with state.lock:
|
| 182 |
+
state.detected_objects = detected_objects
|
| 183 |
+
state.frames_processed += 1
|
| 184 |
+
|
| 185 |
+
# Create chunks every 30 frames
|
| 186 |
+
if state.chunk_id % 30 == 0 and events_text:
|
| 187 |
+
chunk_description = f"At {datetime.now().strftime('%H:%M:%S')}: Detected {', '.join(events_text)}"
|
| 188 |
+
|
| 189 |
+
state.frame_chunks.append({
|
| 190 |
+
'id': state.chunk_id,
|
| 191 |
+
'timestamp': time.time(),
|
| 192 |
+
'description': chunk_description,
|
| 193 |
+
'objects': detected_objects.copy()
|
| 194 |
+
})
|
| 195 |
+
|
| 196 |
+
state.pending_chunks.append({
|
| 197 |
+
'id': state.chunk_id,
|
| 198 |
+
'description': chunk_description,
|
| 199 |
+
'timestamp': time.time(),
|
| 200 |
+
'object_count': len(detected_objects)
|
| 201 |
+
})
|
| 202 |
+
|
| 203 |
+
state.chunk_id += 1
|
| 204 |
+
chunk_count = len(state.frame_chunks)
|
| 205 |
+
|
| 206 |
+
# Add stats overlay
|
| 207 |
+
cv2.putText(frame_bgr, f"Objects: {len(detected_objects)} | Chunks: {chunk_count}",
|
| 208 |
+
(10, 30), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 255, 0), 2)
|
| 209 |
+
|
| 210 |
+
# Convert back to RGB for display in Gradio
|
| 211 |
+
frame_rgb = cv2.cvtColor(frame_bgr, cv2.COLOR_BGR2RGB)
|
| 212 |
+
return gr.update(value=frame_rgb, visible=True)
|
| 213 |
+
|
| 214 |
+
except Exception as e:
|
| 215 |
+
state.event_log.append(f"β Frame error: {str(e)[:50]}")
|
| 216 |
+
return gr.update(value=frame, visible=True)
|
| 217 |
+
|
| 218 |
+
def get_embedding(text):
|
| 219 |
+
"""Get embeddings from OpenAI"""
|
| 220 |
+
if not state.openai_client:
|
| 221 |
+
return None
|
| 222 |
+
try:
|
| 223 |
+
response = state.openai_client.embeddings.create(
|
| 224 |
+
model="text-embedding-3-small",
|
| 225 |
+
input=text
|
| 226 |
+
)
|
| 227 |
+
return response.data[0].embedding
|
| 228 |
+
except Exception as e:
|
| 229 |
+
state.event_log.append(f"β Embedding error: {str(e)[:50]}")
|
| 230 |
+
return None
|
| 231 |
+
|
| 232 |
+
def process_pending_chunks():
|
| 233 |
+
"""Process chunks waiting to be embedded"""
|
| 234 |
+
with state.lock:
|
| 235 |
+
if not state.pending_chunks or not state.video_collection:
|
| 236 |
+
return 0
|
| 237 |
+
chunks_to_process = state.pending_chunks[:5]
|
| 238 |
+
|
| 239 |
+
processed = 0
|
| 240 |
+
for chunk in chunks_to_process:
|
| 241 |
+
try:
|
| 242 |
+
embedding = get_embedding(chunk['description'])
|
| 243 |
+
if embedding:
|
| 244 |
+
state.video_collection.add(
|
| 245 |
+
documents=[chunk['description']],
|
| 246 |
+
embeddings=[embedding],
|
| 247 |
+
ids=[f"chunk_{chunk['id']}"],
|
| 248 |
+
metadatas=[{
|
| 249 |
+
'timestamp': chunk['timestamp'],
|
| 250 |
+
'object_count': chunk['object_count']
|
| 251 |
+
}]
|
| 252 |
+
)
|
| 253 |
+
with state.lock:
|
| 254 |
+
state.pending_chunks.remove(chunk)
|
| 255 |
+
processed += 1
|
| 256 |
+
except Exception as e:
|
| 257 |
+
state.event_log.append(f"β Embed error: {str(e)[:30]}")
|
| 258 |
+
break
|
| 259 |
+
|
| 260 |
+
return processed
|
| 261 |
+
|
| 262 |
+
def query_with_ai(question):
|
| 263 |
+
"""Answer questions using GPT with vector database context"""
|
| 264 |
+
if not state.openai_client:
|
| 265 |
+
return "β οΈ Please enter your OpenAI API key first."
|
| 266 |
+
|
| 267 |
+
if not question or not question.strip():
|
| 268 |
+
return "β οΈ Please enter a question."
|
| 269 |
+
|
| 270 |
+
try:
|
| 271 |
+
# Process pending chunks
|
| 272 |
+
with state.lock:
|
| 273 |
+
has_pending = len(state.pending_chunks) > 0
|
| 274 |
+
|
| 275 |
+
if has_pending:
|
| 276 |
+
processed = process_pending_chunks()
|
| 277 |
+
if processed > 0:
|
| 278 |
+
state.event_log.append(f"β Embedded {processed} chunks")
|
| 279 |
+
|
| 280 |
+
# Get context from vector DB
|
| 281 |
+
context_docs = []
|
| 282 |
+
if state.video_collection:
|
| 283 |
+
question_embedding = get_embedding(question)
|
| 284 |
+
if question_embedding:
|
| 285 |
+
results = state.video_collection.query(
|
| 286 |
+
query_embeddings=[question_embedding],
|
| 287 |
+
n_results=5
|
| 288 |
+
)
|
| 289 |
+
if results and results['documents']:
|
| 290 |
+
context_docs = results['documents'][0]
|
| 291 |
+
|
| 292 |
+
context = "\n".join(context_docs) if context_docs else "No video events stored yet."
|
| 293 |
+
|
| 294 |
+
# Get current state
|
| 295 |
+
with state.lock:
|
| 296 |
+
current_objects = state.detected_objects.copy()
|
| 297 |
+
frames_seen = state.frames_processed
|
| 298 |
+
|
| 299 |
+
if current_objects:
|
| 300 |
+
obj_descriptions = [f"{o['color']} {o['label']}" for o in current_objects]
|
| 301 |
+
current_state = f"Currently visible: {', '.join(obj_descriptions)}"
|
| 302 |
+
else:
|
| 303 |
+
if frames_seen > 0:
|
| 304 |
+
current_state = "Video stream active but no objects detected in the latest frame."
|
| 305 |
+
else:
|
| 306 |
+
current_state = "No video frames processed yet."
|
| 307 |
+
|
| 308 |
+
if not context_docs and frames_seen > 0:
|
| 309 |
+
context = "Video stream active, waiting for notable detections to log."
|
| 310 |
+
|
| 311 |
+
# Create prompt
|
| 312 |
+
prompt = f"""You are a video analysis assistant. Answer the question based on the video footage context.
|
| 313 |
+
|
| 314 |
+
Video Event History (from vector database):
|
| 315 |
+
{context}
|
| 316 |
+
|
| 317 |
+
Current Frame:
|
| 318 |
+
{current_state}
|
| 319 |
+
|
| 320 |
+
Question: {question}
|
| 321 |
+
|
| 322 |
+
Provide a concise, helpful answer based on the video data."""
|
| 323 |
+
|
| 324 |
+
# Call GPT
|
| 325 |
+
response = state.openai_client.chat.completions.create(
|
| 326 |
+
model="gpt-4o-mini",
|
| 327 |
+
messages=[
|
| 328 |
+
{"role": "system", "content": "You are a helpful video analysis assistant."},
|
| 329 |
+
{"role": "user", "content": prompt}
|
| 330 |
+
],
|
| 331 |
+
temperature=0.7,
|
| 332 |
+
max_tokens=200
|
| 333 |
+
)
|
| 334 |
+
|
| 335 |
+
answer = response.choices[0].message.content
|
| 336 |
+
state.event_log.append(f"β Query answered")
|
| 337 |
+
return f"**AI Answer:**\n\n{answer}"
|
| 338 |
+
|
| 339 |
+
except Exception as e:
|
| 340 |
+
error_msg = f"Error querying AI: {str(e)}"
|
| 341 |
+
state.event_log.append(f"β Query error: {str(e)[:30]}")
|
| 342 |
+
return error_msg
|
| 343 |
+
|
| 344 |
+
def setup_api_key(api_key):
|
| 345 |
+
"""Setup OpenAI API key and initialize services"""
|
| 346 |
+
if not api_key or not api_key.strip():
|
| 347 |
+
return "β οΈ Please enter a valid API key", get_stats()
|
| 348 |
+
|
| 349 |
+
success = state.init_openai(api_key)
|
| 350 |
+
if success:
|
| 351 |
+
state.init_vector_db()
|
| 352 |
+
state.init_yolo()
|
| 353 |
+
return "β
OpenAI connected! Vector DB initialized!", get_stats()
|
| 354 |
+
else:
|
| 355 |
+
return "β Failed to connect to OpenAI. Check your API key.", get_stats()
|
| 356 |
+
|
| 357 |
+
def get_stats():
|
| 358 |
+
"""Get current system statistics"""
|
| 359 |
+
with state.lock:
|
| 360 |
+
chunks = len(state.frame_chunks)
|
| 361 |
+
objects = len(state.detected_objects)
|
| 362 |
+
pending = len(state.pending_chunks)
|
| 363 |
+
|
| 364 |
+
vector_count = 0
|
| 365 |
+
if state.video_collection:
|
| 366 |
+
try:
|
| 367 |
+
vector_count = state.video_collection.count()
|
| 368 |
+
except:
|
| 369 |
+
vector_count = 0
|
| 370 |
+
|
| 371 |
+
stats = f"""**System Status:**
|
| 372 |
+
- Chunks Stored: {chunks}
|
| 373 |
+
- Current Objects: {objects}
|
| 374 |
+
- Pending Embeddings: {pending}
|
| 375 |
+
- Vector DB Entries: {vector_count}
|
| 376 |
+
"""
|
| 377 |
+
return stats
|
| 378 |
+
|
| 379 |
+
def get_current_detections():
|
| 380 |
+
"""Get list of currently detected objects"""
|
| 381 |
+
with state.lock:
|
| 382 |
+
current = state.detected_objects.copy()
|
| 383 |
+
|
| 384 |
+
if not current:
|
| 385 |
+
return "No objects detected"
|
| 386 |
+
|
| 387 |
+
output = "**Current Detections:**\n\n"
|
| 388 |
+
for i, obj in enumerate(current):
|
| 389 |
+
output += f"{i+1}. {obj['color']} {obj['label']} ({obj['confidence']:.2f})\n"
|
| 390 |
+
|
| 391 |
+
return output
|
| 392 |
+
|
| 393 |
+
def get_recent_chunks():
|
| 394 |
+
"""Get recent video chunks"""
|
| 395 |
+
with state.lock:
|
| 396 |
+
recent = list(state.frame_chunks)[-5:]
|
| 397 |
+
|
| 398 |
+
if not recent:
|
| 399 |
+
return "No chunks yet - start the video!"
|
| 400 |
+
|
| 401 |
+
output = "**Recent Video Chunks:**\n\n"
|
| 402 |
+
for chunk in recent:
|
| 403 |
+
output += f"[{chunk['id']}] {chunk['description']}\n\n"
|
| 404 |
+
|
| 405 |
+
return output
|
| 406 |
+
|
| 407 |
+
def get_event_log():
|
| 408 |
+
"""Get recent event log"""
|
| 409 |
+
with state.lock:
|
| 410 |
+
events = list(state.event_log)[-10:]
|
| 411 |
+
|
| 412 |
+
if not events:
|
| 413 |
+
return "No events yet"
|
| 414 |
+
|
| 415 |
+
return "\n".join(events)
|
| 416 |
+
|
| 417 |
+
# Initialize YOLO on startup
|
| 418 |
+
state.init_yolo()
|
| 419 |
+
|
| 420 |
+
# Build Gradio interface
|
| 421 |
+
with gr.Blocks(title="AI Video Analysis", theme=gr.themes.Soft()) as demo:
|
| 422 |
+
gr.Markdown("# π₯ AI-Enhanced Video Analysis")
|
| 423 |
+
gr.Markdown("*Real-time object detection with GPT queries and vector database storage*")
|
| 424 |
+
|
| 425 |
+
with gr.Row():
|
| 426 |
+
# Left column - Video and controls
|
| 427 |
+
with gr.Column(scale=2):
|
| 428 |
+
gr.Markdown("## πΉ Live Video Feed")
|
| 429 |
+
|
| 430 |
+
# API Key setup
|
| 431 |
+
with gr.Row():
|
| 432 |
+
api_key_input = gr.Textbox(
|
| 433 |
+
label="OpenAI API Key",
|
| 434 |
+
type="password",
|
| 435 |
+
placeholder="sk-...",
|
| 436 |
+
scale=3
|
| 437 |
+
)
|
| 438 |
+
setup_btn = gr.Button("Connect", scale=1, variant="primary")
|
| 439 |
+
|
| 440 |
+
api_status = gr.Markdown("β οΈ Enter your OpenAI API key to enable AI features")
|
| 441 |
+
|
| 442 |
+
# Live Video Stream
|
| 443 |
+
if YOLO_AVAILABLE:
|
| 444 |
+
processed_feed = gr.Image(
|
| 445 |
+
label="YOLO Detection Feed",
|
| 446 |
+
interactive=False,
|
| 447 |
+
type="numpy",
|
| 448 |
+
visible=False
|
| 449 |
+
)
|
| 450 |
+
webcam_stream = gr.Image(
|
| 451 |
+
label="Webcam Stream",
|
| 452 |
+
sources=["webcam"],
|
| 453 |
+
streaming=True,
|
| 454 |
+
type="numpy"
|
| 455 |
+
)
|
| 456 |
+
webcam_stream.stream(
|
| 457 |
+
fn=process_frame,
|
| 458 |
+
inputs=webcam_stream,
|
| 459 |
+
outputs=processed_feed
|
| 460 |
+
)
|
| 461 |
+
gr.Markdown("πΉ Start the webcam to reveal the YOLO view above. Detections update in real-time and frames are chunked every ~1 second!")
|
| 462 |
+
else:
|
| 463 |
+
gr.Markdown("β YOLO not available. Install with: `pip install ultralytics`")
|
| 464 |
+
|
| 465 |
+
# Troubleshooting
|
| 466 |
+
with gr.Accordion("β οΈ Connection Troubleshooting", open=False):
|
| 467 |
+
gr.Markdown("""
|
| 468 |
+
**If video doesn't connect:**
|
| 469 |
+
|
| 470 |
+
1. **Allow camera permissions** in your browser
|
| 471 |
+
2. **Use HTTPS** - Hugging Face Spaces provides this automatically
|
| 472 |
+
3. **Try Chrome/Edge** - Best webcam streaming support
|
| 473 |
+
4. **Wait 30-60 seconds** on first load for YOLO model download
|
| 474 |
+
5. **Check browser console** for errors (F12)
|
| 475 |
+
|
| 476 |
+
Live streaming uses browser-based webcam APIs; ensure camera access is allowed.
|
| 477 |
+
""")
|
| 478 |
+
|
| 479 |
+
# Right column - AI Query and Stats
|
| 480 |
+
with gr.Column(scale=1):
|
| 481 |
+
gr.Markdown("## π€ AI Query Interface")
|
| 482 |
+
|
| 483 |
+
query_input = gr.Textbox(
|
| 484 |
+
label="Ask about the video",
|
| 485 |
+
placeholder="e.g., What objects appeared in the last 30 seconds?",
|
| 486 |
+
lines=3
|
| 487 |
+
)
|
| 488 |
+
query_btn = gr.Button("π Ask AI", variant="primary")
|
| 489 |
+
query_output = gr.Markdown("*AI response will appear here*")
|
| 490 |
+
|
| 491 |
+
gr.Markdown("---")
|
| 492 |
+
|
| 493 |
+
# Stats
|
| 494 |
+
stats_display = gr.Markdown(value=get_stats, every=10)
|
| 495 |
+
refresh_btn = gr.Button("π Refresh Stats", size="sm")
|
| 496 |
+
|
| 497 |
+
gr.Markdown("---")
|
| 498 |
+
|
| 499 |
+
# Current detections
|
| 500 |
+
detections_display = gr.Markdown(
|
| 501 |
+
value=get_current_detections,
|
| 502 |
+
every=10
|
| 503 |
+
)
|
| 504 |
+
|
| 505 |
+
gr.Markdown("---")
|
| 506 |
+
|
| 507 |
+
# Recent chunks
|
| 508 |
+
chunks_display = gr.Markdown(
|
| 509 |
+
value=get_recent_chunks,
|
| 510 |
+
every=10
|
| 511 |
+
)
|
| 512 |
+
|
| 513 |
+
gr.Markdown("---")
|
| 514 |
+
|
| 515 |
+
# Event log
|
| 516 |
+
gr.Markdown("### π Event Log")
|
| 517 |
+
log_display = gr.Markdown(
|
| 518 |
+
value=get_event_log,
|
| 519 |
+
every=10
|
| 520 |
+
)
|
| 521 |
+
|
| 522 |
+
# How it works
|
| 523 |
+
with gr.Accordion("βΉοΈ How This Works", open=False):
|
| 524 |
+
gr.Markdown("""
|
| 525 |
+
### π― Features:
|
| 526 |
+
|
| 527 |
+
**1. Real-time Object Detection:**
|
| 528 |
+
- YOLOv8 detects objects in your webcam feed
|
| 529 |
+
- Color detection identifies object colors
|
| 530 |
+
- Bounding boxes drawn in real-time
|
| 531 |
+
|
| 532 |
+
**2. Frame Chunking:**
|
| 533 |
+
- Video frames grouped into 1-second chunks (30 frames)
|
| 534 |
+
- Chunks stored in memory (last 100) and vector database
|
| 535 |
+
|
| 536 |
+
**3. Vector Database (ChromaDB):**
|
| 537 |
+
- Semantic embeddings of video events
|
| 538 |
+
- Similarity search across video history
|
| 539 |
+
|
| 540 |
+
**4. OpenAI Integration:**
|
| 541 |
+
- GPT-4o-mini for intelligent query answering
|
| 542 |
+
- text-embedding-3-small for semantic search
|
| 543 |
+
- Context-aware responses based on video history
|
| 544 |
+
|
| 545 |
+
### π§ Tech Stack:
|
| 546 |
+
- **YOLOv8**: Real-time object detection
|
| 547 |
+
- **Gradio Live Video**: Smooth webcam streaming
|
| 548 |
+
- **OpenAI GPT**: Natural language understanding
|
| 549 |
+
- **ChromaDB**: Vector similarity search
|
| 550 |
+
- **Hugging Face Spaces**: Free deployment with TURN servers
|
| 551 |
+
|
| 552 |
+
### π° Costs:
|
| 553 |
+
- **Hugging Face Spaces**: Free (or $9/month PRO for better resources)
|
| 554 |
+
- **OpenAI API**: Pay-as-you-go (minimal for this use case)
|
| 555 |
+
- **TURN Servers**: Free 10GB/month via Cloudflare FastRTC
|
| 556 |
+
""")
|
| 557 |
+
|
| 558 |
+
# Event handlers
|
| 559 |
+
setup_btn.click(
|
| 560 |
+
fn=setup_api_key,
|
| 561 |
+
inputs=[api_key_input],
|
| 562 |
+
outputs=[api_status, stats_display]
|
| 563 |
+
)
|
| 564 |
+
|
| 565 |
+
query_btn.click(
|
| 566 |
+
fn=query_with_ai,
|
| 567 |
+
inputs=[query_input],
|
| 568 |
+
outputs=[query_output]
|
| 569 |
+
)
|
| 570 |
+
|
| 571 |
+
refresh_btn.click(
|
| 572 |
+
fn=lambda: [get_stats(), get_current_detections(), get_recent_chunks(), get_event_log()],
|
| 573 |
+
outputs=[stats_display, detections_display, chunks_display, log_display]
|
| 574 |
+
)
|
| 575 |
+
|
| 576 |
+
# Launch the app
|
| 577 |
+
if __name__ == "__main__":
|
| 578 |
+
demo.launch()
|
requirements.txt
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Gradio WebRTC AI Video Analysis Requirements
|
| 2 |
+
# For Hugging Face Spaces deployment
|
| 3 |
+
|
| 4 |
+
gradio>=5.0.0
|
| 5 |
+
opencv-python-headless>=4.8.0
|
| 6 |
+
ultralytics>=8.0.0
|
| 7 |
+
openai>=1.0.0
|
| 8 |
+
chromadb>=0.4.0
|
| 9 |
+
numpy>=1.24.0
|
| 10 |
+
torch>=2.0.0
|
| 11 |
+
torchvision>=0.15.0
|