xywang626 commited on
Commit
2c80820
Β·
verified Β·
1 Parent(s): 0146518

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +22 -22
README.md CHANGED
@@ -131,16 +131,14 @@ It also closes the gap to proprietary Claude models.
131
  ### GUI Grounding Performance
132
  <div align="center">
133
 
134
- | **Model** | **OSWorld-G** | **ScreenSpot-V2** | **ScreenSpot-Pro** |
135
- |-------|-----------|---------------|----------------|
136
- | Qwen2.5-VL-7B | 31.4 | 88.8 | 27.6 |
137
- | Qwen2.5-VL-32B | 46.5 | 87.0 | 39.4 |
138
- | UI-TARS-72B | 57.1 | 90.3 | 38.1 |
139
- | **OpenCUA-A3B** | 48.6 | 91.4 | 28.5 |
140
- | **OpenCUA-Qwen2-7B** | 45.7 | 88.5 | 23.7 |
141
- | **OpenCUA-7B** | 55.3 | 92.3 | 50.0 |
142
- | **OpenCUA-32B** | **59.6** | **93.4** | 55.3 |
143
- | **OpenCUA-72B** | - | 92.9 | **60.8** |
144
  </div>
145
 
146
 
@@ -159,7 +157,7 @@ It also closes the gap to proprietary Claude models.
159
 
160
  # πŸš€ Quick Start
161
  <div style="border-left: 6px solid #f28c28; background: #fff8e6; padding: 12px 16px; margin: 16px 0;">
162
- <strong>⚠️ Important for Qwen-based Models (OpenCUA-7B, OpenCUA-32B):</strong>
163
 
164
  To align with our training infrastructure, we have modified the model in two places:
165
  <ul style="margin-top: 8px;">
@@ -184,8 +182,8 @@ Download the model weight from huggingface:
184
  ```bash
185
  from huggingface_hub import snapshot_download
186
  snapshot_download(
187
- repo_id="xlangai/OpenCUA-7B",
188
- local_dir="OpenCUA-7B",
189
  local_dir_use_symlinks=False
190
  )
191
  ```
@@ -274,7 +272,7 @@ def run_inference(model, tokenizer, image_processor, messages, image_path):
274
  return output_text
275
 
276
  # Example usage
277
- model_path = "OpenCUA/OpenCUA-7B" # or other model variants
278
  image_path = "screenshot.png"
279
  instruction = "Click on the submit button"
280
 
@@ -306,13 +304,14 @@ python huggingface_inference.py
306
  Command for running OpenCUA-7B and OpenCUA-32B in OSWorld:
307
  ```
308
  python run_multienv_opencua.py \
309
- --headless \
310
- --observation_type screenshot \
311
- --model OpenCUA-32B \
312
- --result_dir ./results --test_all_meta_path evaluation_examples/test_all_no_gdrive.json \
313
- --max_steps 100 \
314
- --num_envs 30 \
315
- --coordinate_type qwen25
 
316
  ```
317
  <div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
318
  <em>Currently we only supports huggingface inference. We are implementing the vLLM supports of OpenCUA models. Please stay tuned.</em>
@@ -430,6 +429,7 @@ OpenCUA models are intended for **research and educational purposes only**.
430
  <li><strong><code>OpenCUA/OpenCUA-Qwen2-7B</code></strong> – Relative coordinates</li>
431
  <li><strong><code>OpenCUA/OpenCUA-7B</code></strong> – Absolute coordinates</li>
432
  <li><strong><code>OpenCUA/OpenCUA-32B</code></strong> – Absolute coordinates</li>
 
433
  </ul>
434
  </div>
435
 
@@ -447,7 +447,7 @@ OpenCUA models are intended for **research and educational purposes only**.
447
  return abs_x, abs_y
448
  ```
449
 
450
- - **OpenCUA-7B and OpenCUA-32B** (Qwen2.5-based): Output **absolute coordinates** after smart resize
451
  ```python
452
  # Example output: pyautogui.click(x=960, y=324)
453
  # These are coordinates on the smart-resized image, not the original image
 
131
  ### GUI Grounding Performance
132
  <div align="center">
133
 
134
+ | **Model** | **OSWorld-G** | **ScreenSpot-V2** | **ScreenSpot-Pro** | **UI-Vision** |
135
+ |-------|-----------|---------------|----------------|----------------|
136
+ | Qwen2.5-VL-7B | 31.4 | 88.8 | 27.6 | 0.85 |
137
+ | Qwen2.5-VL-32B | 46.5 | 87.0 | 39.4 | - |
138
+ | UI-TARS-72B | 57.1 | 90.3 | 38.1 | 25.5 |
139
+ | **OpenCUA-7B** | 55.3 | 92.3 | 50.0 | 29.7 |
140
+ | **OpenCUA-32B** | **59.6** | **93.4** | 55.3 | 33.3 |
141
+ | **OpenCUA-72B** | 59.2 | 92.9 | **60.8** | **37.3** |
 
 
142
  </div>
143
 
144
 
 
157
 
158
  # πŸš€ Quick Start
159
  <div style="border-left: 6px solid #f28c28; background: #fff8e6; padding: 12px 16px; margin: 16px 0;">
160
+ <strong>⚠️ Important for Qwen-based Models (OpenCUA-7B, OpenCUA-32B, OpenCUA-72B):</strong>
161
 
162
  To align with our training infrastructure, we have modified the model in two places:
163
  <ul style="margin-top: 8px;">
 
182
  ```bash
183
  from huggingface_hub import snapshot_download
184
  snapshot_download(
185
+ repo_id="xlangai/OpenCUA-72B",
186
+ local_dir="OpenCUA-72B",
187
  local_dir_use_symlinks=False
188
  )
189
  ```
 
272
  return output_text
273
 
274
  # Example usage
275
+ model_path = "OpenCUA/OpenCUA-72B" # or other model variants
276
  image_path = "screenshot.png"
277
  instruction = "Click on the submit button"
278
 
 
304
  Command for running OpenCUA-7B and OpenCUA-32B in OSWorld:
305
  ```
306
  python run_multienv_opencua.py \
307
+ --headless \
308
+ --observation_type screenshot \
309
+ --model OpenCUA-72B \
310
+ --result_dir ./results\
311
+ --test_all_meta_path evaluation_examples/test_nogdrive.json \
312
+ --max_steps 100 \
313
+ --num_envs 30 \
314
+ --coordinate_type qwen25
315
  ```
316
  <div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
317
  <em>Currently we only supports huggingface inference. We are implementing the vLLM supports of OpenCUA models. Please stay tuned.</em>
 
429
  <li><strong><code>OpenCUA/OpenCUA-Qwen2-7B</code></strong> – Relative coordinates</li>
430
  <li><strong><code>OpenCUA/OpenCUA-7B</code></strong> – Absolute coordinates</li>
431
  <li><strong><code>OpenCUA/OpenCUA-32B</code></strong> – Absolute coordinates</li>
432
+ <li><strong><code>OpenCUA/OpenCUA-72B</code></strong> – Absolute coordinates</li>
433
  </ul>
434
  </div>
435
 
 
447
  return abs_x, abs_y
448
  ```
449
 
450
+ - **OpenCUA-7B, OpenCUA-32B, OpenCUA-72B** (Qwen2.5-based): Output **absolute coordinates** after smart resize
451
  ```python
452
  # Example output: pyautogui.click(x=960, y=324)
453
  # These are coordinates on the smart-resized image, not the original image