Improve model card: Add metadata and links

This PR enhances the model card for the VLN-PE benchmark and models.

Key improvements include:
- **Metadata**: Added `pipeline_tag: robotics`, `library_name: transformers`, and `license: mit` for better discoverability and standardized information.
- **Links**: Added explicit links to the associated research paper, project page, and the main GitHub repository within the model card content. This provides users with direct access to more detailed information.
- **Content**: The existing benchmark results table remains unchanged, ensuring no disruption to current information. A citation section has also been added based on the GitHub README.

Please review and merge if these improvements are satisfactory.

Files changed (1) hide show

README.md +32 -2

README.md CHANGED Viewed

@@ -1,4 +1,16 @@
-**VLN-PE Benchmark**
 <style type="text/css">
 .tg  {border-collapse:collapse;border-spacing:0;}
 .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
@@ -173,4 +185,22 @@
     <td class="tg-0pky">18.65</td>
     <td class="tg-0pky"><a href="https://huggingface.co/InternRobotics/VLN-PE/tree/main/r2r/fine_tuned/cma_plus" target="_blank" rel="noopener noreferrer">model</a></td>
   </tr>
-</tbody></table>

+---
+pipeline_tag: robotics
+library_name: transformers
+license: mit
+---
+This repository contains models for the **VLN-PE Benchmark**, as presented in the paper [Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities](https://huggingface.co/papers/2507.13019).
+VLN-PE introduces a physically realistic Vision-and-Language Navigation platform supporting humanoid, quadruped, and wheeled robots, and systematically evaluates several ego-centric VLN methods in physical robotic settings.
+For more details, visit the [project page](https://crystalsixone.github.io/vln_pe.github.io/) or the main [GitHub repository](https://github.com/InternRobotics/InternNav).
+## VLN-PE Benchmark
 <style type="text/css">
 .tg  {border-collapse:collapse;border-spacing:0;}
 .tg td{border-color:black;border-style:solid;border-width:1px;font-family:Arial, sans-serif;font-size:14px;
     <td class="tg-0pky">18.65</td>
     <td class="tg-0pky"><a href="https://huggingface.co/InternRobotics/VLN-PE/tree/main/r2r/fine_tuned/cma_plus" target="_blank" rel="noopener noreferrer">model</a></td>
   </tr>
+</tbody></table>
+## Citation
+If you find our work helpful, please cite:
+```bibtex
+@inproceedings{vlnpe,
+  title={Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities},
+  author={Wang, Liuyi and Xia, Xinyuan and Zhao, Hui and Wang, Hanqing and Wang, Tai and Chen, Yilun and Liu, Chengju and Chen, Qijun and Pang, Jiangmiao},
+  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
+  year={2025}
+}
+@misc{internnav2025,
+    title = {{InternNav: InternRobotics'} open platform for building generalized navigation foundation models},
+    author = {InternNav Contributors},
+    howpublished={\url{https://github.com/InternRobotics/InternNav}},
+    year = {2025}
+}
+```