xlangai
/

OpenCUA-72B

@@ -131,16 +131,14 @@ It also closes the gap to proprietary Claude models.
 ### GUI Grounding Performance
 <div align="center">
-| **Model** | **OSWorld-G** | **ScreenSpot-V2** | **ScreenSpot-Pro** |
-|-------|-----------|---------------|----------------|
-| Qwen2.5-VL-7B | 31.4 | 88.8 | 27.6 |
-| Qwen2.5-VL-32B | 46.5 | 87.0 | 39.4 |
-| UI-TARS-72B | 57.1 | 90.3 | 38.1 |
-| **OpenCUA-A3B** | 48.6 | 91.4 | 28.5 |
-| **OpenCUA-Qwen2-7B** | 45.7 | 88.5 | 23.7 |
-| **OpenCUA-7B** | 55.3 | 92.3 | 50.0 |
-| **OpenCUA-32B** | **59.6** | **93.4** | 55.3 |
-| **OpenCUA-72B** | - | 92.9 | **60.8** |
 </div>
@@ -159,7 +157,7 @@ It also closes the gap to proprietary Claude models.
 #  🚀 Quick Start
 <div style="border-left: 6px solid #f28c28; background: #fff8e6; padding: 12px 16px; margin: 16px 0;">
-  <strong>⚠️ Important for Qwen-based Models (OpenCUA-7B, OpenCUA-32B):</strong>
   To align with our training infrastructure, we have modified the model in two places:
   <ul style="margin-top: 8px;">
@@ -184,8 +182,8 @@ Download the model weight from huggingface:
 ```bash
 from huggingface_hub import snapshot_download
 snapshot_download(
-    repo_id="xlangai/OpenCUA-7B",
-    local_dir="OpenCUA-7B",
     local_dir_use_symlinks=False
 )
 ```
@@ -274,7 +272,7 @@ def run_inference(model, tokenizer, image_processor, messages, image_path):
     return output_text
 # Example usage
-model_path = "OpenCUA/OpenCUA-7B"  # or other model variants
 image_path = "screenshot.png"
 instruction = "Click on the submit button"
@@ -306,13 +304,14 @@ python huggingface_inference.py
 Command for running OpenCUA-7B and OpenCUA-32B in OSWorld:
 ```
     python run_multienv_opencua.py \
-        --headless \
-        --observation_type screenshot \
-        --model OpenCUA-32B \
-        --result_dir ./results --test_all_meta_path evaluation_examples/test_all_no_gdrive.json \
-        --max_steps 100 \
-        --num_envs 30  \
-        --coordinate_type qwen25
 ```
 <div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
   <em>Currently we only supports huggingface inference. We are implementing the vLLM supports of OpenCUA models. Please stay tuned.</em>
@@ -430,6 +429,7 @@ OpenCUA models are intended for **research and educational purposes only**.
     <li><strong><code>OpenCUA/OpenCUA-Qwen2-7B</code></strong> – Relative coordinates</li>
     <li><strong><code>OpenCUA/OpenCUA-7B</code></strong> – Absolute coordinates</li>
     <li><strong><code>OpenCUA/OpenCUA-32B</code></strong> – Absolute coordinates</li>
   </ul>
 </div>
@@ -447,7 +447,7 @@ OpenCUA models are intended for **research and educational purposes only**.
       return abs_x, abs_y
   ```
-- **OpenCUA-7B and OpenCUA-32B** (Qwen2.5-based): Output **absolute coordinates** after smart resize
   ```python
   # Example output: pyautogui.click(x=960, y=324)
   # These are coordinates on the smart-resized image, not the original image

 ### GUI Grounding Performance
 <div align="center">
+| **Model** | **OSWorld-G** | **ScreenSpot-V2** | **ScreenSpot-Pro** | **UI-Vision** |
+|-------|-----------|---------------|----------------|----------------|
+| Qwen2.5-VL-7B | 31.4 | 88.8 | 27.6 |  0.85 |
+| Qwen2.5-VL-32B | 46.5 | 87.0 | 39.4 | - |
+| UI-TARS-72B | 57.1 | 90.3 | 38.1 | 25.5 |
+| **OpenCUA-7B** | 55.3 | 92.3 | 50.0 | 29.7 |
+| **OpenCUA-32B** | **59.6** | **93.4** | 55.3 | 33.3 |
+| **OpenCUA-72B** | 59.2 | 92.9 | **60.8** | **37.3** |
 </div>
 #  🚀 Quick Start
 <div style="border-left: 6px solid #f28c28; background: #fff8e6; padding: 12px 16px; margin: 16px 0;">
+  <strong>⚠️ Important for Qwen-based Models (OpenCUA-7B, OpenCUA-32B, OpenCUA-72B):</strong>
   To align with our training infrastructure, we have modified the model in two places:
   <ul style="margin-top: 8px;">
 ```bash
 from huggingface_hub import snapshot_download
 snapshot_download(
+    repo_id="xlangai/OpenCUA-72B",
+    local_dir="OpenCUA-72B",
     local_dir_use_symlinks=False
 )
 ```
     return output_text
 # Example usage
+model_path = "OpenCUA/OpenCUA-72B"  # or other model variants
 image_path = "screenshot.png"
 instruction = "Click on the submit button"
 Command for running OpenCUA-7B and OpenCUA-32B in OSWorld:
 ```
     python run_multienv_opencua.py \
+      --headless \
+      --observation_type screenshot \
+      --model OpenCUA-72B \
+      --result_dir ./results\
+      --test_all_meta_path evaluation_examples/test_nogdrive.json \
+      --max_steps 100 \
+      --num_envs 30  \
+      --coordinate_type qwen25
 ```
 <div style="border-left: 6px solid #9ca3af; background: #f5f5f5; padding: 12px 16px; margin: 16px 0;">
   <em>Currently we only supports huggingface inference. We are implementing the vLLM supports of OpenCUA models. Please stay tuned.</em>
     <li><strong><code>OpenCUA/OpenCUA-Qwen2-7B</code></strong> – Relative coordinates</li>
     <li><strong><code>OpenCUA/OpenCUA-7B</code></strong> – Absolute coordinates</li>
     <li><strong><code>OpenCUA/OpenCUA-32B</code></strong> – Absolute coordinates</li>
+    <li><strong><code>OpenCUA/OpenCUA-72B</code></strong> – Absolute coordinates</li>
   </ul>
 </div>
       return abs_x, abs_y
   ```
+- **OpenCUA-7B, OpenCUA-32B, OpenCUA-72B** (Qwen2.5-based): Output **absolute coordinates** after smart resize
   ```python
   # Example output: pyautogui.click(x=960, y=324)
   # These are coordinates on the smart-resized image, not the original image