Improve model card: Add metadata and links
Browse filesThis PR enhances the model card for the VLN-PE benchmark and models.
Key improvements include:
- **Metadata**: Added `pipeline_tag: robotics`, `library_name: transformers`, and `license: mit` for better discoverability and standardized information.
- **Links**: Added explicit links to the associated research paper, project page, and the main GitHub repository within the model card content. This provides users with direct access to more detailed information.
- **Content**: The existing benchmark results table remains unchanged, ensuring no disruption to current information. A citation section has also been added based on the GitHub README.
Please review and merge if these improvements are satisfactory.
README.md
CHANGED
|
@@ -1,4 +1,16 @@
|
|
| 1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
<style type="text/css">
|
| 3 |
.tg {border-collapse:collapse;border-spacing:0;}
|
| 4 |
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
|
@@ -173,4 +185,22 @@
|
|
| 173 |
<td class="tg-0pky">18.65</td>
|
| 174 |
<td class="tg-0pky"><a href="https://huggingface.co/InternRobotics/VLN-PE/tree/main/r2r/fine_tuned/cma_plus" target="_blank" rel="noopener noreferrer">model</a></td>
|
| 175 |
</tr>
|
| 176 |
-
</tbody></table>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
pipeline_tag: robotics
|
| 3 |
+
library_name: transformers
|
| 4 |
+
license: mit
|
| 5 |
+
---
|
| 6 |
+
|
| 7 |
+
This repository contains models for the **VLN-PE Benchmark**, as presented in the paper [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://huggingface.co/papers/2507.13019).
|
| 8 |
+
|
| 9 |
+
VLN-PE introduces a physically realistic Vision-and-Language Navigation platform supporting humanoid, quadruped, and wheeled robots, and systematically evaluates several ego-centric VLN methods in physical robotic settings.
|
| 10 |
+
|
| 11 |
+
For more details, visit the [project page](https://crystalsixone.github.io/vln_pe.github.io/) or the main [GitHub repository](https://github.com/InternRobotics/InternNav).
|
| 12 |
+
|
| 13 |
+
## VLN-PE Benchmark
|
| 14 |
<style type="text/css">
|
| 15 |
.tg {border-collapse:collapse;border-spacing:0;}
|
| 16 |
.tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
|
|
|
|
| 185 |
<td class="tg-0pky">18.65</td>
|
| 186 |
<td class="tg-0pky"><a href="https://huggingface.co/InternRobotics/VLN-PE/tree/main/r2r/fine_tuned/cma_plus" target="_blank" rel="noopener noreferrer">model</a></td>
|
| 187 |
</tr>
|
| 188 |
+
</tbody></table>
|
| 189 |
+
|
| 190 |
+
## Citation
|
| 191 |
+
If you find our work helpful, please cite:
|
| 192 |
+
|
| 193 |
+
```bibtex
|
| 194 |
+
@inproceedings{vlnpe,
|
| 195 |
+
title={Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities},
|
| 196 |
+
author={Wang, Liuyi and Xia, Xinyuan and Zhao, Hui and Wang, Hanqing and Wang, Tai and Chen, Yilun and Liu, Chengju and Chen, Qijun and Pang, Jiangmiao},
|
| 197 |
+
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
|
| 198 |
+
year={2025}
|
| 199 |
+
}
|
| 200 |
+
@misc{internnav2025,
|
| 201 |
+
title = {{InternNav: InternRobotics'} open platform for building generalized navigation foundation models},
|
| 202 |
+
author = {InternNav Contributors},
|
| 203 |
+
howpublished={\url{https://github.com/InternRobotics/InternNav}},
|
| 204 |
+
year = {2025}
|
| 205 |
+
}
|
| 206 |
+
```
|