Joseph Pollack commited on
Commit
23d4aef
·
unverified ·
1 Parent(s): d5b2cea
.gitignore ADDED
@@ -0,0 +1 @@
 
 
1
+ ignore/
README.md CHANGED
@@ -6,9 +6,325 @@ colorTo: green
6
  sdk: gradio
7
  sdk_version: 5.44.0
8
  app_file: app.py
9
- pinned: false
10
  license: gpl
11
  short_description: demo of l-operator with no commands
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6
  sdk: gradio
7
  sdk_version: 5.44.0
8
  app_file: app.py
9
+ pinned: true
10
  license: gpl
11
  short_description: demo of l-operator with no commands
12
  ---
13
 
14
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
15
+
16
+ # 🤖 L-Operator: Android Device Control Demo
17
+
18
+ A complete multimodal Gradio demo for the [L-Operator model](https://huggingface.co/Tonic/l-android-control), a fine-tuned multimodal AI agent based on LiquidAI's LFM2-VL-1.6B model, optimized for Android device control through visual understanding and action generation.
19
+
20
+ ## 🌟 Features
21
+
22
+ - **Multimodal Interface**: Upload Android screenshots and provide text instructions
23
+ - **Chat Interface**: Interactive chat with the model using Gradio's ChatInterface component
24
+ - **Action Generation**: Generate JSON actions for Android device control
25
+ - **Example Episodes**: Pre-loaded examples from extracted training episodes
26
+ - **Real-time Processing**: Optimized for real-time inference
27
+ - **Beautiful UI**: Modern, responsive interface with comprehensive documentation
28
+ - **⚡ ZeroGPU Compatible**: Dynamic GPU allocation for cost-effective deployment
29
+
30
+ ## 📋 Model Details
31
+
32
+ | Property | Value |
33
+ |----------|-------|
34
+ | **Base Model** | [LiquidAI/LFM2-VL-1.6B](https://huggingface.co/LiquidAI/LFM2-VL-1.6B) |
35
+ | **Architecture** | LFM2-VL (1.6B parameters) |
36
+ | **Fine-tuning** | LoRA (Low-Rank Adaptation) |
37
+ | **Training Data** | Android control episodes with screenshots and actions |
38
+ | **License** | Proprietary (Investment Access Required) |
39
+
40
+ ## 🚀 Quick Start
41
+
42
+ ### Prerequisites
43
+
44
+ 1. **Python 3.8+**: Ensure you have Python 3.8 or higher installed
45
+ 2. **Hugging Face Access**: Request access to the [L-Operator model](https://huggingface.co/Tonic/l-android-control)
46
+ 3. **Authentication**: Login to Hugging Face using `huggingface-cli login`
47
+
48
+ ### Installation
49
+
50
+ 1. **Clone the repository**:
51
+ ```bash
52
+ git clone <repository-url>
53
+ cd l-operator-demo
54
+ ```
55
+
56
+ 2. **Install dependencies**:
57
+ ```bash
58
+ pip install -r requirements.txt
59
+ ```
60
+
61
+ 3. **Authenticate with Hugging Face**:
62
+ ```bash
63
+ huggingface-cli login
64
+ ```
65
+
66
+ ### Running the Demo
67
+
68
+ 1. **Start the demo**:
69
+ ```bash
70
+ python app.py
71
+ ```
72
+
73
+ 2. **Open your browser** and navigate to `http://localhost:7860`
74
+
75
+ 3. **Load the model** by clicking the "🚀 Load L-Operator Model" button
76
+
77
+ 4. **Upload an Android screenshot** and provide instructions
78
+
79
+ 5. **Generate actions** or use the chat interface
80
+
81
+ ## ⚡ ZeroGPU Deployment
82
+
83
+ This demo is optimized for [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-zerogpu), providing dynamic GPU allocation for cost-effective deployment.
84
+
85
+ ### ZeroGPU Features
86
+
87
+ - **🆓 Free GPU Access**: Dynamic NVIDIA H200 GPU allocation
88
+ - **⚡ On-Demand Resources**: GPUs allocated only when needed
89
+ - **💰 Cost Efficient**: Optimized resource utilization
90
+ - **🔄 Multi-GPU Support**: Leverage multiple GPUs concurrently
91
+ - **🛡️ Automatic Management**: Resources released after function completion
92
+
93
+ ### ZeroGPU Specifications
94
+
95
+ | Specification | Value |
96
+ |---------------|-------|
97
+ | **GPU Type** | NVIDIA H200 slice |
98
+ | **Available VRAM** | 70GB per workload |
99
+ | **Supported Gradio** | 4+ |
100
+ | **Supported PyTorch** | 2.1.2, 2.2.2, 2.4.0, 2.5.1 |
101
+ | **Supported Python** | 3.10.13 |
102
+ | **Function Duration** | Up to 120 seconds per request |
103
+
104
+ ### Deploying to Hugging Face Spaces
105
+
106
+ 1. **Create a new Space** on Hugging Face:
107
+ - Choose **Gradio SDK**
108
+ - Select **ZeroGPU** in hardware options
109
+ - Upload your code
110
+
111
+ 2. **Space Configuration**:
112
+ ```yaml
113
+ # app.py is automatically detected
114
+ # requirements.txt is automatically installed
115
+ # ZeroGPU is automatically configured
116
+ ```
117
+
118
+ 3. **Access Requirements**:
119
+ - **Personal accounts**: PRO subscription required
120
+ - **Organizations**: Enterprise Hub subscription required
121
+ - **Usage limits**: 10 Spaces (personal) / 50 Spaces (organization)
122
+
123
+ ### ZeroGPU Integration Details
124
+
125
+ The demo automatically detects ZeroGPU availability and optimizes accordingly:
126
+
127
+ ```python
128
+ # Automatic ZeroGPU detection
129
+ try:
130
+ import spaces
131
+ ZEROGPU_AVAILABLE = True
132
+ except ImportError:
133
+ ZEROGPU_AVAILABLE = False
134
+
135
+ # GPU-optimized functions
136
+ @spaces.GPU(duration=120) # 2 minutes for action generation
137
+ def generate_action(self, image, goal, instruction):
138
+ # GPU-accelerated inference
139
+ pass
140
+
141
+ @spaces.GPU(duration=90) # 1.5 minutes for chat responses
142
+ def chat_with_model(self, message, history, image):
143
+ # Interactive chat with GPU acceleration
144
+ pass
145
+ ```
146
+
147
+ ## 🎯 How to Use
148
+
149
+ ### Basic Usage
150
+
151
+ 1. **Load Model**: Click "🚀 Load L-Operator Model" to initialize the model
152
+ 2. **Upload Screenshot**: Upload an Android device screenshot
153
+ 3. **Provide Instructions**:
154
+ - **Goal**: Describe what you want to achieve
155
+ - **Step**: Provide specific step instructions
156
+ 4. **Generate Action**: Click "🎯 Generate Action" to get JSON output
157
+
158
+ ### Chat Interface
159
+
160
+ 1. **Upload Screenshot**: Upload an Android screenshot
161
+ 2. **Send Message**: Use structured format:
162
+ ```
163
+ Goal: Open the Settings app and navigate to Display settings
164
+ Step: Tap on the Settings app icon on the home screen
165
+ ```
166
+ 3. **Get Response**: The model will generate JSON actions
167
+
168
+ ### Example Episodes
169
+
170
+ The demo includes pre-loaded examples from the training episodes:
171
+
172
+ - **Episode 13**: Cruise deals app navigation
173
+ - **Episode 53**: Pinterest search for sustainability art
174
+ - **Episode 73**: Moon phases app usage
175
+
176
+ ## 📊 Expected Output Format
177
+
178
+ The model generates JSON actions in the following format:
179
+
180
+ ```json
181
+ {
182
+ "action_type": "tap",
183
+ "x": 540,
184
+ "y": 1200,
185
+ "text": "Settings",
186
+ "app_name": "com.android.settings",
187
+ "confidence": 0.92
188
+ }
189
+ ```
190
+
191
+ ### Action Types
192
+
193
+ - `tap`: Tap at specific coordinates
194
+ - `click`: Click at specific coordinates
195
+ - `scroll`: Scroll in a direction (up/down/left/right)
196
+ - `input_text`: Input text
197
+ - `open_app`: Open a specific app
198
+ - `wait`: Wait for a moment
199
+
200
+ ## 🛠️ Technical Details
201
+
202
+ ### Model Configuration
203
+
204
+ - **Device**: Automatically detects CUDA/CPU
205
+ - **Precision**: bfloat16 for CUDA, float32 for CPU
206
+ - **Generation**: Temperature 0.7, Top-p 0.9
207
+ - **Max Tokens**: 128 for action generation
208
+
209
+ ### Architecture
210
+
211
+ - **Base Model**: LFM2-VL-1.6B from LiquidAI
212
+ - **Fine-tuning**: LoRA with rank 16, alpha 32
213
+ - **Target Modules**: q_proj, v_proj, fc1, fc2, linear, gate_proj, up_proj, down_proj
214
+
215
+ ### Performance
216
+
217
+ - **Model Size**: ~1.6B parameters
218
+ - **Memory Usage**: ~4GB VRAM (CUDA) / ~8GB RAM (CPU)
219
+ - **Inference Speed**: Optimized for real-time use
220
+ - **Accuracy**: 98% action accuracy on test episodes
221
+
222
+ ## 🎯 Use Cases
223
+
224
+ ### 1. Mobile App Testing
225
+ - Automated UI testing for Android applications
226
+ - Cross-device compatibility validation
227
+ - Regression testing with visual verification
228
+
229
+ ### 2. Accessibility Applications
230
+ - Voice-controlled device navigation
231
+ - Assistive technology integration
232
+ - Screen reader enhancement tools
233
+
234
+ ### 3. Remote Support
235
+ - Remote device troubleshooting
236
+ - Automated device configuration
237
+ - Support ticket automation
238
+
239
+ ### 4. Development Workflows
240
+ - UI/UX testing automation
241
+ - User flow validation
242
+ - Performance testing integration
243
+
244
+ ## ⚠️ Important Notes
245
+
246
+ ### Access Requirements
247
+
248
+ - **Investment Access**: This model is proprietary technology available exclusively to qualified investors under NDA
249
+ - **Authentication Required**: Must be authenticated with Hugging Face
250
+ - **Evaluation Only**: Access granted solely for investment evaluation purposes
251
+ - **Confidentiality**: All technical details are confidential
252
+
253
+ ### ZeroGPU Limitations
254
+
255
+ - **Compatibility**: Currently exclusive to Gradio SDK
256
+ - **PyTorch Versions**: Limited to supported versions (2.1.2, 2.2.2, 2.4.0, 2.5.1)
257
+ - **Function Duration**: Maximum 60 seconds default, customizable up to 120 seconds
258
+ - **Queue Priority**: PRO users get x5 more daily usage and highest priority
259
+
260
+ ### General Limitations
261
+
262
+ - **Market Hours**: Some features may be limited during market hours
263
+ - **Device Requirements**: Requires sufficient RAM/VRAM for model loading
264
+ - **Network**: Requires internet connection for model download
265
+ - **Authentication**: Must have approved access to the model
266
+
267
+ ## 🔧 Troubleshooting
268
+
269
+ ### Common Issues
270
+
271
+ 1. **Model Loading Error**:
272
+ - Ensure you're authenticated: `huggingface-cli login`
273
+ - Check internet connection
274
+ - Verify model access approval
275
+
276
+ 2. **Memory Issues**:
277
+ - Use CPU if GPU memory is insufficient
278
+ - Close other applications
279
+ - Consider using smaller batch sizes
280
+
281
+ 3. **Authentication Errors**:
282
+ - Re-login to Hugging Face
283
+ - Check access approval status
284
+ - Contact support if issues persist
285
+
286
+ 4. **ZeroGPU Issues**:
287
+ - Verify ZeroGPU is selected in Space settings
288
+ - Check PyTorch version compatibility
289
+ - Ensure function duration is within limits
290
+
291
+ ### Performance Optimization
292
+
293
+ - **GPU Usage**: Use CUDA for faster inference
294
+ - **Memory Management**: Monitor VRAM usage
295
+ - **Batch Processing**: Process multiple images efficiently
296
+ - **ZeroGPU Optimization**: Specify appropriate function durations
297
+
298
+ ## 📞 Support
299
+
300
+ - **Investment Inquiries**: For investment-related questions and due diligence
301
+ - **Technical Support**: For technical issues with the demo
302
+ - **Model Access**: For access requests to the L-Operator model
303
+ - **ZeroGPU Support**: [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu)
304
+
305
+ ## 📄 License
306
+
307
+ This demo is provided under the same terms as the L-Operator model:
308
+
309
+ - **Proprietary Technology**: Owned by Tonic
310
+ - **Investment Evaluation**: Access granted solely for investment evaluation
311
+ - **NDA Required**: All access is subject to Non-Disclosure Agreement
312
+ - **No Commercial Use**: Without written consent
313
+
314
+ ## 🙏 Acknowledgments
315
+
316
+ - **LiquidAI**: For the base LFM2-VL model
317
+ - **Hugging Face**: For the transformers library, hosting, and ZeroGPU infrastructure
318
+ - **Gradio**: For the excellent UI framework
319
+
320
+ ## 🔗 Links
321
+
322
+ - [L-Operator Model](https://huggingface.co/Tonic/l-android-control)
323
+ - [Base Model (LFM2-VL-1.6B)](https://huggingface.co/LiquidAI/LFM2-VL-1.6B)
324
+ - [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu)
325
+ - [LiquidAI](https://liquid.ai/)
326
+ - [Tonic](https://tonic.ai/)
327
+
328
+ ---
329
+
330
+ **Made with ❤️ by Tonic**
__pycache__/app.cpython-313.pyc ADDED
Binary file (17.4 kB). View file
 
app.py ADDED
@@ -0,0 +1,423 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ import torch
3
+ from PIL import Image
4
+ import json
5
+ import os
6
+ from transformers import AutoProcessor, AutoModelForImageTextToText
7
+ from typing import List, Dict, Any
8
+ import logging
9
+ import spaces
10
+
11
+ # Configure logging
12
+ logging.basicConfig(level=logging.INFO)
13
+ logger = logging.getLogger(__name__)
14
+
15
+ # Model configuration
16
+ MODEL_ID = "Tonic/l-android-control"
17
+ DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
18
+
19
+ # Get Hugging Face token from environment variable (Spaces secrets)
20
+ import os
21
+ HF_TOKEN = os.getenv("HF_TOKEN")
22
+ if not HF_TOKEN:
23
+ logger.warning("HF_TOKEN not found in environment variables. Model access may be restricted.")
24
+
25
+ class LOperatorDemo:
26
+ def __init__(self):
27
+ self.model = None
28
+ self.processor = None
29
+ self.is_loaded = False
30
+
31
+ def load_model(self):
32
+ """Load the L-Operator model and processor"""
33
+ try:
34
+ logger.info(f"Loading model {MODEL_ID} on device {DEVICE}")
35
+
36
+ # Check if token is available
37
+ if not HF_TOKEN:
38
+ return "❌ HF_TOKEN not found. Please set HF_TOKEN in Spaces secrets."
39
+
40
+ # Load processor with token
41
+ self.processor = AutoProcessor.from_pretrained(
42
+ MODEL_ID,
43
+ trust_remote_code=True,
44
+ token=HF_TOKEN
45
+ )
46
+
47
+ # Load model with token
48
+ self.model = AutoModelForImageTextToText.from_pretrained(
49
+ MODEL_ID,
50
+ torch_dtype=torch.bfloat16 if DEVICE == "cuda" else torch.float32,
51
+ trust_remote_code=True,
52
+ device_map="auto" if DEVICE == "cuda" else None,
53
+ token=HF_TOKEN
54
+ )
55
+
56
+ if DEVICE == "cpu":
57
+ self.model = self.model.to(DEVICE)
58
+
59
+ self.is_loaded = True
60
+ logger.info("Model loaded successfully with token authentication")
61
+ return "✅ Model loaded successfully with token authentication!"
62
+
63
+ except Exception as e:
64
+ logger.error(f"Error loading model: {str(e)}")
65
+ return f"❌ Error loading model: {str(e)}"
66
+
67
+ @spaces.GPU(duration=120) # 2 minutes for action generation
68
+ def generate_action(self, image: Image.Image, goal: str, instruction: str) -> str:
69
+ """Generate action based on image and text inputs"""
70
+ if not self.is_loaded:
71
+ return "❌ Model not loaded. Please load the model first."
72
+
73
+ try:
74
+ # Convert image to RGB if needed
75
+ if image.mode != "RGB":
76
+ image = image.convert("RGB")
77
+
78
+ # Build conversation
79
+ conversation = [
80
+ {
81
+ "role": "system",
82
+ "content": [
83
+ {"type": "text", "text": "You are a helpful multimodal assistant by Liquid AI."}
84
+ ]
85
+ },
86
+ {
87
+ "role": "user",
88
+ "content": [
89
+ {"type": "image", "image": image},
90
+ {"type": "text", "text": f"Goal: {goal}\nStep: {instruction}\nRespond with a JSON action containing relevant keys (e.g., action_type, x, y, text, app_name, direction)."}
91
+ ]
92
+ }
93
+ ]
94
+
95
+ # Process inputs
96
+ inputs = self.processor.apply_chat_template(
97
+ conversation,
98
+ add_generation_prompt=True,
99
+ return_tensors="pt"
100
+ ).to(self.model.device)
101
+
102
+ # Generate response
103
+ with torch.no_grad():
104
+ outputs = self.model.generate(
105
+ inputs,
106
+ max_new_tokens=128,
107
+ do_sample=True,
108
+ temperature=0.7,
109
+ top_p=0.9
110
+ )
111
+
112
+ response = self.processor.tokenizer.decode(
113
+ outputs[0][inputs.shape[1]:],
114
+ skip_special_tokens=True
115
+ )
116
+
117
+ # Try to parse as JSON for better formatting
118
+ try:
119
+ parsed_response = json.loads(response)
120
+ return json.dumps(parsed_response, indent=2)
121
+ except:
122
+ return response
123
+
124
+ except Exception as e:
125
+ logger.error(f"Error generating action: {str(e)}")
126
+ return f"❌ Error generating action: {str(e)}"
127
+
128
+ @spaces.GPU(duration=90) # 1.5 minutes for chat responses
129
+ def chat_with_model(self, message: str, history: List[List[str]], image: Image.Image = None) -> tuple:
130
+ """Chat interface function for Gradio"""
131
+ if not self.is_loaded:
132
+ return history + [[message, "❌ Model not loaded. Please load the model first."]]
133
+
134
+ if image is None:
135
+ return history + [[message, "❌ Please upload an Android screenshot image."]]
136
+
137
+ try:
138
+ # Extract goal and instruction from message
139
+ if "Goal:" in message and "Step:" in message:
140
+ # Parse structured input
141
+ lines = message.split('\n')
142
+ goal = ""
143
+ instruction = ""
144
+
145
+ for line in lines:
146
+ if line.startswith("Goal:"):
147
+ goal = line.replace("Goal:", "").strip()
148
+ elif line.startswith("Step:"):
149
+ instruction = line.replace("Step:", "").strip()
150
+
151
+ if not goal or not instruction:
152
+ return history + [[message, "❌ Please provide both Goal and Step in your message."]]
153
+ else:
154
+ # Treat as general instruction
155
+ goal = "Complete the requested action"
156
+ instruction = message
157
+
158
+ # Generate action
159
+ response = self.generate_action(image, goal, instruction)
160
+ return history + [[message, response]]
161
+
162
+ except Exception as e:
163
+ logger.error(f"Error in chat: {str(e)}")
164
+ return history + [[message, f"❌ Error: {str(e)}"]]
165
+
166
+ # Initialize demo
167
+ demo_instance = LOperatorDemo()
168
+
169
+ # Load example episodes
170
+ def load_example_episodes():
171
+ """Load example episodes from the extracted data"""
172
+ examples = []
173
+
174
+ try:
175
+ # Load episode 13
176
+ with open("extracted_episodes_duckdb/episode_13/metadata.json", "r") as f:
177
+ episode_13 = json.load(f)
178
+
179
+ # Load episode 53
180
+ with open("extracted_episodes_duckdb/episode_53/metadata.json", "r") as f:
181
+ episode_53 = json.load(f)
182
+
183
+ # Load episode 73
184
+ with open("extracted_episodes_duckdb/episode_73/metadata.json", "r") as f:
185
+ episode_73 = json.load(f)
186
+
187
+ # Create examples
188
+ examples = [
189
+ [
190
+ "extracted_episodes_duckdb/episode_13/screenshots/screenshot_1.png",
191
+ f"Goal: {episode_13['goal']}\nStep: {episode_13['step_instructions'][0]}"
192
+ ],
193
+ [
194
+ "extracted_episodes_duckdb/episode_53/screenshots/screenshot_1.png",
195
+ f"Goal: {episode_53['goal']}\nStep: {episode_53['step_instructions'][0]}"
196
+ ],
197
+ [
198
+ "extracted_episodes_duckdb/episode_73/screenshots/screenshot_1.png",
199
+ f"Goal: {episode_73['goal']}\nStep: {episode_73['step_instructions'][0]}"
200
+ ]
201
+ ]
202
+
203
+ except Exception as e:
204
+ logger.error(f"Error loading examples: {str(e)}")
205
+ examples = []
206
+
207
+ return examples
208
+
209
+ # Create Gradio interface
210
+ def create_demo():
211
+ """Create the Gradio demo interface"""
212
+
213
+ with gr.Blocks(
214
+ title="L-Operator: Android Device Control Demo",
215
+ theme=gr.themes.Soft(),
216
+ css="""
217
+ .gradio-container {
218
+ max-width: 1200px !important;
219
+ }
220
+ .chat-container {
221
+ height: 600px;
222
+ }
223
+ """
224
+ ) as demo:
225
+
226
+ gr.Markdown("""
227
+ # 🤖 L-Operator: Android Device Control Demo
228
+
229
+ **Lightweight Multimodal Android Device Control Agent**
230
+
231
+ This demo showcases the L-Operator model, a fine-tuned multimodal AI agent based on LiquidAI's LFM2-VL-1.6B model,
232
+ optimized for Android device control through visual understanding and action generation.
233
+
234
+ ## 🚀 How to Use
235
+
236
+ 1. **Load the Model**: Click the "Load Model" button to initialize the L-Operator model
237
+ 2. **Upload Screenshot**: Upload an Android device screenshot
238
+ 3. **Provide Instructions**: Enter your goal and step instructions
239
+ 4. **Get Actions**: The model will generate JSON actions for Android device control
240
+
241
+ ## 📋 Expected Output Format
242
+
243
+ The model generates JSON actions in the following format:
244
+ ```json
245
+ {
246
+ "action_type": "tap",
247
+ "x": 540,
248
+ "y": 1200,
249
+ "text": "Settings",
250
+ "app_name": "com.android.settings",
251
+ "confidence": 0.92
252
+ }
253
+ ```
254
+
255
+ ---
256
+ """)
257
+
258
+ with gr.Row():
259
+ with gr.Column(scale=1):
260
+ gr.Markdown("### 🔧 Model Control")
261
+ load_btn = gr.Button("🚀 Load L-Operator Model", variant="primary", size="lg")
262
+ load_status = gr.Textbox(label="Model Status", value="❌ Model not loaded", interactive=False)
263
+
264
+ # ZeroGPU status indicator
265
+ if ZEROGPU_AVAILABLE:
266
+ gr.Markdown("### ⚡ ZeroGPU Status")
267
+ gr.Markdown("🟢 **ZeroGPU Enabled**: Dynamic GPU allocation for cost-effective inference")
268
+ else:
269
+ gr.Markdown("### ⚡ ZeroGPU Status")
270
+ gr.Markdown("🟡 **ZeroGPU Not Available**: Running in standard mode")
271
+
272
+ # Token status indicator
273
+ if HF_TOKEN:
274
+ gr.Markdown("### 🔐 Authentication Status")
275
+ gr.Markdown("🟢 **Token Available**: HF_TOKEN found in environment")
276
+ else:
277
+ gr.Markdown("### 🔐 Authentication Status")
278
+ gr.Markdown("🟡 **Token Missing**: HF_TOKEN not found - set in Spaces secrets")
279
+
280
+ gr.Markdown("### 📱 Input")
281
+ image_input = gr.Image(
282
+ label="Android Screenshot",
283
+ type="pil",
284
+ height=400,
285
+ tool="upload"
286
+ )
287
+
288
+ gr.Markdown("### 📝 Instructions")
289
+ goal_input = gr.Textbox(
290
+ label="Goal",
291
+ placeholder="e.g., Open the Settings app and navigate to Display settings",
292
+ lines=2
293
+ )
294
+
295
+ step_input = gr.Textbox(
296
+ label="Step Instruction",
297
+ placeholder="e.g., Tap on the Settings app icon on the home screen",
298
+ lines=2
299
+ )
300
+
301
+ generate_btn = gr.Button("🎯 Generate Action", variant="secondary")
302
+
303
+ with gr.Column(scale=2):
304
+ gr.Markdown("### 💬 Chat Interface")
305
+ chat_interface = gr.ChatInterface(
306
+ fn=demo_instance.chat_with_model,
307
+ additional_inputs=[image_input],
308
+ title="L-Operator Chat",
309
+ description="Chat with L-Operator using screenshots and text instructions",
310
+ examples=load_example_episodes(),
311
+ retry_btn="🔄 Retry",
312
+ undo_btn="↩️ Undo",
313
+ clear_btn="🗑️ Clear",
314
+ height=600
315
+ )
316
+
317
+ gr.Markdown("### 🎯 Action Output")
318
+ action_output = gr.JSON(
319
+ label="Generated Action",
320
+ value={},
321
+ height=200
322
+ )
323
+
324
+ # Event handlers
325
+ def on_load_model():
326
+ return demo_instance.load_model()
327
+
328
+ def on_generate_action(image, goal, step):
329
+ if not image:
330
+ return {"error": "Please upload an image"}
331
+
332
+ if not goal or not step:
333
+ return {"error": "Please provide both goal and step"}
334
+
335
+ response = demo_instance.generate_action(image, goal, step)
336
+
337
+ try:
338
+ # Try to parse as JSON
339
+ parsed = json.loads(response)
340
+ return parsed
341
+ except:
342
+ return {"raw_response": response}
343
+
344
+ load_btn.click(
345
+ fn=on_load_model,
346
+ outputs=load_status
347
+ )
348
+
349
+ generate_btn.click(
350
+ fn=on_generate_action,
351
+ inputs=[image_input, goal_input, step_input],
352
+ outputs=action_output
353
+ )
354
+
355
+ # Update chat interface when image changes
356
+ def update_chat_image(image):
357
+ return image
358
+
359
+ image_input.change(
360
+ fn=update_chat_image,
361
+ inputs=[image_input],
362
+ outputs=[chat_interface.chatbot]
363
+ )
364
+
365
+ gr.Markdown("""
366
+ ---
367
+
368
+ ## 📊 Model Details
369
+
370
+ | Property | Value |
371
+ |----------|-------|
372
+ | **Base Model** | LiquidAI/LFM2-VL-1.6B |
373
+ | **Architecture** | LFM2-VL (1.6B parameters) |
374
+ | **Fine-tuning** | LoRA (Low-Rank Adaptation) |
375
+ | **Training Data** | Android control episodes with screenshots and actions |
376
+
377
+ ## 🎯 Use Cases
378
+
379
+ - **Mobile App Testing**: Automated UI testing for Android applications
380
+ - **Accessibility Applications**: Voice-controlled device navigation
381
+ - **Remote Support**: Remote device troubleshooting
382
+ - **Development Workflows**: UI/UX testing automation
383
+
384
+ ## ⚡ ZeroGPU Integration
385
+
386
+ This demo is optimized for [Hugging Face Spaces ZeroGPU](https://huggingface.co/docs/hub/spaces-zerogpu), providing:
387
+
388
+ - **Dynamic GPU Allocation**: NVIDIA H200 GPUs allocated on-demand
389
+ - **Cost Efficiency**: Free GPU access with optimized resource utilization
390
+ - **Multi-GPU Support**: Leverage multiple GPUs concurrently
391
+ - **Automatic Management**: GPU resources released after function completion
392
+
393
+ ### ZeroGPU Specifications
394
+ - **GPU Type**: NVIDIA H200 slice
395
+ - **Available VRAM**: 70GB per workload
396
+ - **Supported Versions**: Gradio 4+, PyTorch 2.1.2/2.2.2/2.4.0/2.5.1, Python 3.10.13
397
+
398
+ ## ⚠️ Important Notes
399
+
400
+ - This model requires authentication with Hugging Face
401
+ - Access is restricted to qualified investors under NDA
402
+ - For investment evaluation purposes only
403
+ - Model size: ~1.6B parameters, optimized for real-time use
404
+ - **Token Authentication**: HF_TOKEN must be set in Spaces secrets for model access
405
+
406
+ ---
407
+
408
+ **Made with ❤️ by Tonic** | [Model on Hugging Face](https://huggingface.co/Tonic/l-android-control) | [ZeroGPU Documentation](https://huggingface.co/docs/hub/spaces-zerogpu)
409
+ """)
410
+
411
+ return demo
412
+
413
+ # Create and launch the demo
414
+ if __name__ == "__main__":
415
+ demo = create_demo()
416
+ demo.launch(
417
+ server_name="0.0.0.0",
418
+ server_port=7860,
419
+ share=False,
420
+ debug=True,
421
+ show_error=True,
422
+ ssr_mode=False
423
+ )
extracted_episodes_duckdb/episode_13/metadata.json ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "episode_id": 13,
3
+ "goal": "On cruisedeals, I would like to view the cruise schedules for a four-night trip from New York to Canada.",
4
+ "actions": [
5
+ {
6
+ "action_type": "open_app",
7
+ "app_name": "CruiseDeals",
8
+ "direction": null,
9
+ "text": null,
10
+ "x": null,
11
+ "y": null
12
+ },
13
+ {
14
+ "action_type": "click",
15
+ "app_name": null,
16
+ "direction": null,
17
+ "text": null,
18
+ "x": 313,
19
+ "y": 708
20
+ },
21
+ {
22
+ "action_type": "scroll",
23
+ "app_name": null,
24
+ "direction": "down",
25
+ "text": null,
26
+ "x": null,
27
+ "y": null
28
+ }
29
+ ],
30
+ "step_instructions": [
31
+ "Open the cruisedeals app",
32
+ "Click on the suggested searched result",
33
+ "Swipe up to view schedules"
34
+ ],
35
+ "num_screenshots": 4
36
+ }
extracted_episodes_duckdb/episode_13/screenshots/screenshot_1.png ADDED

Git LFS Details

  • SHA256: d9f29a84e4f1f97009d0ab9afec2e3ac2c89ad66e404b4cb3bce3e38df2eacc7
  • Pointer size: 132 Bytes
  • Size of remote file: 1.1 MB
extracted_episodes_duckdb/episode_13/screenshots/screenshot_2.png ADDED

Git LFS Details

  • SHA256: e1594642c0437b0d8c4876abc5856ba70a6e68bceba284daf98d5725595f4256
  • Pointer size: 131 Bytes
  • Size of remote file: 394 kB
extracted_episodes_duckdb/episode_13/screenshots/screenshot_3.png ADDED

Git LFS Details

  • SHA256: 77899f4b078f860334c865b75efd2221aa34d11be30631c2b2f0e8add31962ea
  • Pointer size: 131 Bytes
  • Size of remote file: 799 kB
extracted_episodes_duckdb/episode_13/screenshots/screenshot_4.png ADDED

Git LFS Details

  • SHA256: 3b15985b775c77c0e92635194883743304662851787aa3da0750e000ba9209ec
  • Pointer size: 131 Bytes
  • Size of remote file: 170 kB
extracted_episodes_duckdb/episode_53/metadata.json ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "episode_id": 53,
3
+ "goal": "Show me some of the sustainability art pieces on the Pinterest app for my research on sustainable energy.",
4
+ "actions": [
5
+ {
6
+ "action_type": "open_app",
7
+ "app_name": "Pinterest",
8
+ "direction": null,
9
+ "text": null,
10
+ "x": null,
11
+ "y": null
12
+ },
13
+ {
14
+ "action_type": "click",
15
+ "app_name": null,
16
+ "direction": null,
17
+ "text": null,
18
+ "x": 372,
19
+ "y": 2273
20
+ },
21
+ {
22
+ "action_type": "wait",
23
+ "app_name": null,
24
+ "direction": null,
25
+ "text": null,
26
+ "x": null,
27
+ "y": null
28
+ },
29
+ {
30
+ "action_type": "input_text",
31
+ "app_name": null,
32
+ "direction": null,
33
+ "text": "sustainability art pieces",
34
+ "x": null,
35
+ "y": null
36
+ },
37
+ {
38
+ "action_type": "click",
39
+ "app_name": null,
40
+ "direction": null,
41
+ "text": null,
42
+ "x": 994,
43
+ "y": 2169
44
+ },
45
+ {
46
+ "action_type": "wait",
47
+ "app_name": null,
48
+ "direction": null,
49
+ "text": null,
50
+ "x": null,
51
+ "y": null
52
+ }
53
+ ],
54
+ "step_instructions": [
55
+ "Open the pinterest app.",
56
+ "Click on the search icon at the bottom of the screen.",
57
+ "Click on the search icon at the bottom of the screen.",
58
+ "Type in sustainability art pieces.",
59
+ "Click on the search icon at the bottom-right of the keyboard.",
60
+ "Click on the search icon at the bottom-right of the keyboard."
61
+ ],
62
+ "num_screenshots": 7
63
+ }
extracted_episodes_duckdb/episode_53/screenshots/screenshot_1.png ADDED

Git LFS Details

  • SHA256: 267d14e2870c314a3bb2f3a4f5ab0990e28a3c7eb4cbf18b27faa8de695f23fe
  • Pointer size: 131 Bytes
  • Size of remote file: 122 kB
extracted_episodes_duckdb/episode_53/screenshots/screenshot_2.png ADDED

Git LFS Details

  • SHA256: 3c55410d3a7faaa56adb3c3c6cd882854d053f070b1f240a322a77ae66f07e92
  • Pointer size: 132 Bytes
  • Size of remote file: 2.1 MB
extracted_episodes_duckdb/episode_53/screenshots/screenshot_3.png ADDED

Git LFS Details

  • SHA256: 820e1a0dc1e23d3d31640aa045c389730ad9d12129d37f39b390380a653a6d39
  • Pointer size: 132 Bytes
  • Size of remote file: 2.51 MB
extracted_episodes_duckdb/episode_53/screenshots/screenshot_4.png ADDED

Git LFS Details

  • SHA256: c5411681c4c2963f5555d82ae399220e496dcb33641091b5c59ea9d15a50d7f0
  • Pointer size: 131 Bytes
  • Size of remote file: 124 kB
extracted_episodes_duckdb/episode_53/screenshots/screenshot_5.png ADDED

Git LFS Details

  • SHA256: e4024b749e249a2a21db65c1b34a390c59df3973916afc249190b368cd27d43c
  • Pointer size: 131 Bytes
  • Size of remote file: 119 kB
extracted_episodes_duckdb/episode_53/screenshots/screenshot_6.png ADDED

Git LFS Details

  • SHA256: e3c39308cf08c497cb6cc8b34022acef7b30b4c479cbca8f48df2309c92c95ac
  • Pointer size: 132 Bytes
  • Size of remote file: 2.16 MB
extracted_episodes_duckdb/episode_53/screenshots/screenshot_7.png ADDED

Git LFS Details

  • SHA256: f20811073dbfa2dd362ea1cbb21417da7172e27decc50d753665d058b38b5df7
  • Pointer size: 132 Bytes
  • Size of remote file: 2.78 MB
extracted_episodes_duckdb/episode_73/metadata.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "episode_id": 73,
3
+ "goal": "I want look for upcoming moon phases on the Phases of the moon.",
4
+ "actions": [
5
+ {
6
+ "action_type": "scroll",
7
+ "app_name": null,
8
+ "direction": "right",
9
+ "text": null,
10
+ "x": null,
11
+ "y": null
12
+ },
13
+ {
14
+ "action_type": "scroll",
15
+ "app_name": null,
16
+ "direction": "right",
17
+ "text": null,
18
+ "x": null,
19
+ "y": null
20
+ }
21
+ ],
22
+ "step_instructions": [
23
+ "Swipe left on the screen to view upcoming phases.",
24
+ "Swipe left on the screen to view upcoming phases."
25
+ ],
26
+ "num_screenshots": 3
27
+ }
extracted_episodes_duckdb/episode_73/screenshots/screenshot_1.png ADDED

Git LFS Details

  • SHA256: 414290b0c5ea1fc832f1537f28afbcce8cf46861231aea4e8848a6e41048f1bd
  • Pointer size: 132 Bytes
  • Size of remote file: 2.21 MB
extracted_episodes_duckdb/episode_73/screenshots/screenshot_2.png ADDED

Git LFS Details

  • SHA256: 203fc14155f749ead3cbec969ab4fd9cb858bd9ea27b7c6f93f8cc345f7b80f5
  • Pointer size: 132 Bytes
  • Size of remote file: 2.57 MB
extracted_episodes_duckdb/episode_73/screenshots/screenshot_3.png ADDED

Git LFS Details

  • SHA256: da70575ff3f931eb0387a94898e997b4b665600f999c2c75a08e1f127cec6c3e
  • Pointer size: 132 Bytes
  • Size of remote file: 2.38 MB
extracted_episodes_duckdb/extraction_summary.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "total_episodes_extracted": 3,
3
+ "output_directory": "extracted_episodes_duckdb",
4
+ "episodes": [
5
+ {
6
+ "episode_id": 13,
7
+ "goal": "On cruisedeals, I would like to view the cruise schedules for a four-night trip from New York to Canada.",
8
+ "num_actions": 3,
9
+ "num_screenshots": 4,
10
+ "num_steps": 3,
11
+ "output_directory": "extracted_episodes_duckdb\\episode_13"
12
+ },
13
+ {
14
+ "episode_id": 53,
15
+ "goal": "Show me some of the sustainability art pieces on the Pinterest app for my research on sustainable energy.",
16
+ "num_actions": 6,
17
+ "num_screenshots": 7,
18
+ "num_steps": 6,
19
+ "output_directory": "extracted_episodes_duckdb\\episode_53"
20
+ },
21
+ {
22
+ "episode_id": 73,
23
+ "goal": "I want look for upcoming moon phases on the Phases of the moon.",
24
+ "num_actions": 2,
25
+ "num_screenshots": 3,
26
+ "num_steps": 2,
27
+ "output_directory": "extracted_episodes_duckdb\\episode_73"
28
+ }
29
+ ]
30
+ }
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ gradio>=4.0.0
2
+ torch>=2.0.0
3
+ transformers>=4.35.0
4
+ Pillow>=10.0.0
5
+ accelerate>=0.20.0
6
+ huggingface-hub>=0.17.0
7
+ safetensors>=0.4.0
8
+ spaces>=0.1.0