Spaces:

axrzce
/

Comp-I

Running

App Files Files Community

Comp-I / docs /PHASE3E_PERFORMANCE_GUIDE.md

axrzce

Deploy from GitHub main

338d95d verified 4 months ago

preview code

raw

history blame contribute delete

11.2 kB

	# ⚙️ CompI Phase 3.E: Performance, Model Management & Reliability - Complete Guide

	## 🎯 What Phase 3.E Delivers

	Phase 3.E transforms CompI into a production-grade platform with professional performance management, intelligent reliability, and advanced model capabilities.

	### 🤖 Model Manager
	- Dynamic Model Switching: Switch between SD 1.5 and SDXL based on requirements
	- Auto-Availability Checking: Intelligent detection of model compatibility and VRAM requirements
	- Universal LoRA Support: Load and scale LoRA weights across all models and generation modes
	- Smart Recommendations: Hardware-based model suggestions and optimization advice

	### ⚡ Performance Controls
	- xFormers Integration: Memory-efficient attention with automatic fallback
	- Advanced Memory Optimization: Attention slicing, VAE slicing/tiling, CPU offloading
	- Precision Control: Automatic dtype selection (fp16/bf16/fp32) based on hardware
	- Batch Optimization: Memory-aware batch processing with intelligent sizing

	### 📊 VRAM Monitoring
	- Real-time Tracking: Live GPU memory usage monitoring and alerts
	- Usage Analytics: Memory usage patterns and optimization suggestions
	- Threshold Warnings: Automatic alerts when approaching memory limits
	- Cache Management: Intelligent GPU cache clearing and memory cleanup

	### 🛡️ Reliability Engine
	- OOM-Safe Generation: Automatic retry with progressive fallback strategies
	- Intelligent Fallbacks: Reduce size → reduce steps → CPU fallback progression
	- Error Classification: Smart error detection and appropriate response strategies
	- Graceful Degradation: Maintain functionality even under resource constraints

	### 📦 Batch Processing
	- Seed-Controlled Batches: Deterministic seed sequences for reproducible results
	- Memory-Aware Batching: Automatic batch size optimization based on available VRAM
	- Progress Tracking: Detailed progress monitoring with per-image status
	- Failure Recovery: Continue batch processing even if individual images fail

	### 🔍 Upscaler Integration
	- Latent Upscaler: Optional 2x upscaling using Stable Diffusion Latent Upscaler
	- Graceful Degradation: Clean fallback when upscaler unavailable
	- Memory Management: Intelligent memory allocation for upscaling operations
	- Quality Enhancement: Professional-grade image enhancement capabilities

	---

	## 🚀 Quick Start Guide

	### 1. Launch Phase 3.E
	```bash
	# Method 1: Using launcher script (recommended)
	python run_phase3e_performance_manager.py

	# Method 2: Direct Streamlit launch
	streamlit run src/ui/compi_phase3e_performance_manager.py --server.port 8505
	```

	### 2. System Requirements Check
	The launcher automatically checks:
	- GPU Setup: CUDA availability and VRAM capacity
	- Dependencies: Required and optional packages
	- Model Support: SD 1.5 and SDXL availability
	- Performance Features: xFormers and upscaler support

	### 3. Access the Interface
	- URL: `http://localhost:8505`
	- Interface: Professional Streamlit dashboard with real-time monitoring
	- Sidebar: Live VRAM monitoring and system status

	---

	## 🎨 Professional Workflow

	### Step 1: Model Selection
	1. Choose Base Model: SD 1.5 (fast, compatible) or SDXL (high quality, more VRAM)
	2. Select Generation Mode: txt2img or img2img
	3. Check Compatibility: System automatically validates model/mode combinations
	4. Review VRAM Requirements: See memory requirements and availability status

	### Step 2: LoRA Integration (Optional)
	1. Enable LoRA: Toggle LoRA support
	2. Specify Path: Enter path to LoRA weights (diffusers format)
	3. Set Scale: Adjust LoRA influence (0.1-2.0)
	4. Verify Status: Check LoRA loading status and compatibility

	### Step 3: Performance Optimization
	1. Choose Optimization Level: Conservative, Balanced, Aggressive, or Extreme
	2. Monitor VRAM: Watch real-time memory usage in sidebar
	3. Adjust Settings: Fine-tune individual optimization features
	4. Enable Reliability: Configure OOM retry and CPU fallback options

	### Step 4: Generation
	1. Single Images: Generate individual images with full control
	2. Batch Processing: Create multiple images with seed sequences
	3. Monitor Progress: Track generation progress and memory usage
	4. Review Results: Analyze generation statistics and performance metrics

	---

	## 🔧 Advanced Features

	### 🤖 Model Manager Deep Dive

	#### Model Compatibility Matrix
	```python
	SD 1.5:
	✅ txt2img (512x512 optimal)
	✅ img2img (all strengths)
	✅ ControlNet (full support)
	✅ LoRA (universal compatibility)
	💾 VRAM: 4+ GB recommended

	SDXL:
	✅ txt2img (1024x1024 optimal)
	✅ img2img (limited support)
	⚠️ ControlNet (requires special handling)
	✅ LoRA (SDXL-compatible weights only)
	💾 VRAM: 8+ GB recommended
	```

	#### Automatic Model Selection Logic
	- VRAM < 6GB: Recommends SD 1.5 only
	- VRAM 6-8GB: SD 1.5 preferred, SDXL with warnings
	- VRAM 8GB+: Full SDXL support with optimizations
	- CPU Mode: SD 1.5 only with aggressive optimizations

	### ⚡ Performance Optimization Levels

	#### Conservative Mode
	- Basic attention slicing
	- Standard precision (fp16/fp32)
	- Minimal memory optimizations
	- Best for: Stable systems, first-time users

	#### Balanced Mode (Default)
	- xFormers attention (if available)
	- Attention + VAE slicing
	- Automatic precision selection
	- Best for: Most users, good performance/stability balance

	#### Aggressive Mode
	- All memory optimizations enabled
	- VAE tiling for large images
	- Maximum memory efficiency
	- Best for: Limited VRAM, large batch processing

	#### Extreme Mode
	- CPU offloading enabled
	- Maximum memory savings
	- Slower but uses minimal VRAM
	- Best for: Very limited VRAM (<4GB)

	### 🛡️ Reliability Engine Strategies

	#### Fallback Progression
	```python
	Strategy 1: Original settings (100% size, 100% steps)
	Strategy 2: Reduced size (75% size, 90% steps)
	Strategy 3: Half size (50% size, 80% steps)
	Strategy 4: Minimal (50% size, 60% steps)
	Final: CPU fallback if all GPU attempts fail
	```

	#### Error Classification
	- CUDA OOM: Triggers progressive fallback
	- Model Loading: Suggests alternative models
	- LoRA Errors: Disables LoRA and retries
	- General Errors: Logs and reports with context

	### 📊 VRAM Monitoring System

	#### Real-time Metrics
	- Total VRAM: Hardware capacity
	- Used VRAM: Currently allocated memory
	- Free VRAM: Available for new operations
	- Usage Percentage: Current utilization level

	#### Smart Alerts
	- Green (0-60%): Optimal usage
	- Yellow (60-80%): Moderate usage, monitor closely
	- Red (80%+): High usage, optimization recommended

	#### Memory Management
	- Automatic Cache Clearing: Between batch generations
	- Memory Leak Detection: Identifies and resolves memory issues
	- Optimization Suggestions: Hardware-specific recommendations

	---

	## 📈 Performance Benchmarks

	### Generation Speed Comparison
	```
	SD 1.5 (512x512, 20 steps):
	RTX 4090: ~15-25 seconds
	RTX 3080: ~25-35 seconds
	RTX 2080: ~45-60 seconds
	CPU: ~5-10 minutes

	SDXL (1024x1024, 20 steps):
	RTX 4090: ~30-45 seconds
	RTX 3080: ~60-90 seconds
	RTX 2080: ~2-3 minutes (with optimizations)
	CPU: ~15-30 minutes
	```

	### Memory Usage Patterns
	```
	SD 1.5:
	Base: ~3.5GB VRAM
	+ LoRA: ~3.7GB VRAM
	+ Upscaler: ~5.5GB VRAM

	SDXL:
	Base: ~6.5GB VRAM
	+ LoRA: ~7.0GB VRAM
	+ Upscaler: ~9.0GB VRAM
	```

	---

	## 🔍 Troubleshooting Guide

	### Common Issues & Solutions

	#### "CUDA Out of Memory" Errors
	1. Enable OOM Auto-Retry: Automatic fallback handling
	2. Reduce Image Size: Use 512x512 instead of 1024x1024
	3. Lower Batch Size: Generate fewer images simultaneously
	4. Enable Aggressive Optimizations: Use VAE slicing/tiling
	5. Clear GPU Cache: Use sidebar "Clear GPU Cache" button

	#### Slow Generation Speed
	1. Enable xFormers: Significant speed improvement if available
	2. Use Balanced Optimization: Good speed/quality trade-off
	3. Reduce Inference Steps: 15-20 steps often sufficient
	4. Check VRAM Usage: Ensure not hitting memory limits

	#### Model Loading Failures
	1. Check Internet Connection: Models download on first use
	2. Verify Disk Space: Models require 2-7GB storage each
	3. Try Alternative Model: Switch between SD 1.5 and SDXL
	4. Clear Model Cache: Remove cached models and re-download

	#### LoRA Loading Issues
	1. Verify Path: Ensure LoRA files exist at specified path
	2. Check Format: Use diffusers-compatible LoRA weights
	3. Model Compatibility: Ensure LoRA matches base model type
	4. Scale Adjustment: Try different LoRA scale values

	---

	## 🎯 Best Practices

	### 📝 Performance Optimization
	1. Start Conservative: Begin with balanced settings, adjust as needed
	2. Monitor VRAM: Keep usage below 80% for stability
	3. Batch Wisely: Use smaller batches on limited hardware
	4. Clear Cache Regularly: Prevent memory accumulation

	### 🤖 Model Selection
	1. SD 1.5 for Speed: Faster generation, lower VRAM requirements
	2. SDXL for Quality: Higher resolution, better detail
	3. Match Hardware: Choose model based on available VRAM
	4. Test Compatibility: Verify model works with your use case

	### 🛡️ Reliability
	1. Enable Auto-Retry: Let system handle OOM errors automatically
	2. Use Fallbacks: Allow progressive degradation for reliability
	3. Monitor Logs: Check run logs for patterns and issues
	4. Plan for Failures: Design workflows that handle generation failures

	---

	## 🚀 Integration with CompI Ecosystem

	### Universal Enhancement
	Phase 3.E enhances ALL existing CompI components:
	- Ultimate Dashboard: Model switching and performance controls
	- Phase 2.A-2.E: Reliability and optimization for all multimodal phases
	- Phase 1.A-1.E: Enhanced foundation with professional features
	- Phase 3.D: Performance metrics in workflow management

	### Backward Compatibility
	- Graceful Degradation: Works on all hardware configurations
	- Default Settings: Optimal defaults for most users
	- Progressive Enhancement: Advanced features when available
	- Legacy Support: Maintains compatibility with existing workflows

	---

	## 🎉 Phase 3.E: Production-Grade CompI Complete

	Phase 3.E transforms CompI into a production-grade platform with professional performance management, intelligent reliability, and advanced model capabilities.

	Key Benefits:
	- ✅ Professional Performance: Industry-standard optimization and monitoring
	- ✅ Intelligent Reliability: Automatic error handling and recovery
	- ✅ Advanced Model Management: Dynamic switching and LoRA integration
	- ✅ Production Ready: Suitable for commercial and professional use
	- ✅ Universal Enhancement: Improves all existing CompI features

	CompI is now a complete, production-grade multimodal AI art generation platform! 🎨✨