VLN-PE / README.md

nielsr HF Staff

Improve model card: Add metadata and links

e09e26f verified 3 months ago

preview code

raw

history blame

8.47 kB

metadata

pipeline_tag: robotics
library_name: transformers
license: mit

This repository contains models for the VLN-PE Benchmark, as presented in the paper Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities.

VLN-PE introduces a physically realistic Vision-and-Language Navigation platform supporting humanoid, quadruped, and wheeled robots, and systematically evaluates several ego-centric VLN methods in physical robotic settings.

For more details, visit the project page or the main GitHub repository.

VLN-PE Benchmark

Model	Dataset/Benchmark	Val Seen							Val Unseen							Download
Model	Dataset/Benchmark	TL	NE	FR	StR	OS	SR	SPL	TL	NE	FR	StR	OS	SR	SPL	Download
Zero-shot transfer evaluation from VLN-CE
Seq2Seq-Full	R2R VLN-PE	7.80	7.62	20.21	3.04	19.3	15.2	12.79	7.73	7.18	18.04	3.04	22.42	16.48	14.11	model
CMA-Full	R2R VLN-PE	6.62	7.37	20.06	3.95	18.54	16.11	14.61	6.58	7.09	17.07	3.79	20.86	16.93	15.24	model
Train on VLN-PE
Seq2Seq	R2R VLN-PE	10.61	7.53	27.36	4.26	32.67	19.75	14.68	10.85	7.88	26.8	5.57	28.13	15.14	10.77	model
CMA	R2R VLN-PE	11.13	7.59	23.71	3.19	34.94	21.58	16.1	11.16	7.98	22.64	3.27	33.11	19.15	14.05	model
RDP	R2R VLN-PE	13.26	6.76	27.51	1.82	38.6	25.08	17.07	12.7	6.72	24.57	3.11	36.9	25.24	17.73	model
Seq2Seq+	R2R VLN-PE	10.22	7.75	33.43	3.19	30.09	16.86	12.54	9.88	7.85	26.27	6.52	28.79	16.56	12.7	model
CMA+	R2R VLN-PE	8.86	7.14	23.56	3.5	36.17	25.84	21.75	8.79	7.26	21.75	3.27	31.4	22.12	18.65	model

Citation

If you find our work helpful, please cite:

@inproceedings{vlnpe,
  title={Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities},
  author={Wang, Liuyi and Xia, Xinyuan and Zhao, Hui and Wang, Hanqing and Wang, Tai and Chen, Yilun and Liu, Chengju and Chen, Qijun and Pang, Jiangmiao},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  year={2025}
}
@misc{internnav2025,
    title = {{InternNav: InternRobotics'} open platform for building generalized navigation foundation models},
    author = {InternNav Contributors},
    howpublished={\url{https://github.com/InternRobotics/InternNav}},
    year = {2025}
}