Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired, zero-shot, end-to-end framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines CoDe-VLM and Exec-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3Dv1, HM3Dv2, MP3D, where it achieves state-of-the-art performance on both SR and SPL metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot and end-to-end navigation without requiring prior map building or pre-training.
🔥 We've reorganized and cleaned up the repository to ensure a clear, well-structured codebase. Please give the training and inference scripts a try, and feel free to leave an issue if you run into any problems. We apologize for any confusion caused by our original codebase release. 5.15, 2025
🔥 We've released some demos. 5.22, 2025
🎬Real: an orange cushion that fell on the ground

🎬Real: an orange cushion that fell on the ground

-
Clone this repo.
-
Create the conda environment and install all dependencies.
conda create -n doraemon python=3.9 cmake=3.14.0 conda activate doraemon conda install habitat-sim=0.3.1 withbullet headless -c conda-forge -c aihabitat pip install -r requirements.txt
This project is based on Habitat simulator and the HM3D and MP3D datasets are available here. Our code requires all above data to be in a data folder in the following format. Move the downloaded HM3D v0.1, HM3D v0.2 and MP3D folders into the following configuration:
├── <DATASET_ROOT>
│ ├── hm3d_v0.1/
│ │ ├── val/
│ │ │ ├── 00800-TEEsavR23oF/
│ │ │ │ ├── TEEsavR23oF.navmesh
│ │ │ │ ├── TEEsavR23oF.glb
│ │ ├── hm3d_annotated_basis.scene_dataset_config.json
│ ├── objectnav_hm3d_v0.1/
│ │ ├── val/
│ │ │ ├── content/
│ │ │ │ ├──4ok3usBNeis.json.gz
│ │ │ ├── val.json.gz
│ ├── hm3d_v0.2/
│ │ ├── val/
│ │ │ ├── 00800-TEEsavR23oF/
│ │ │ │ ├── TEEsavR23oF.basis.navmesh
│ │ │ │ ├── TEEsavR23oF.basis.glb
│ │ ├── hm3d_annotated_basis.scene_dataset_config.json
│ ├── objectnav_hm3d_v0.2/
│ │ ├── val/
│ │ │ ├── content/
│ │ │ │ ├──4ok3usBNeis.json.gz
│ │ │ ├── val.json.gz
│ ├── mp3d/
│ │ ├── 17DRP5sb8fy/
│ │ │ ├── 17DRP5sb8fy.glb
│ │ │ ├── 17DRP5sb8fy.house
│ │ │ ├── 17DRP5sb8fy.navmesh
│ │ │ ├── 17DRP5sb8fy_semantic.ply
│ │ ├── mmp3d.scene_dataset_config.json
│ ├── objectnav_mp3d/
│ │ ├── val/
│ │ │ ├── content/
│ │ │ │ ├──2azQ1b91cZZ.json.gz
│ │ │ ├── val.json.gz
You can set your own GeminiAPI key by export GEMINI_API_KEY=xxx
Run python scripts/main.py to visualize the result of an episode.
To evaluate DORAEMON, we use a framework for parallel evaluation (HM3D v0.1 contains 1000 episodes, 2000 episodes for HM3D v0.2 and 2195 episodes for MP3D). The file parallel_gpu0.sh contains a script to distribute K instances over N GPUs, and for each of them to run M episodes. A local flask server is initialized to handle the data aggregation, and then the aggregated results are logged to wandb. Make sure you are logged in with wandb login.
If you find our work useful, please cite:
@misc{gu2025doraemondecentralizedontologyawarereliable,
title={DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation},
author={Tianjun Gu and Linfeng Li and Xuhong Wang and Chenghua Gong and Jingyu Gong and Zhizhong Zhang and Yuan Xie and Lizhuang Ma and Xin Tan},
year={2025},
eprint={2505.21969},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2505.21969},
}For questions about this work, please contact:
Tianjun Gu: [email protected]
Project Page: https://grady10086.github.io/DORAEMON/











