Skip to content

Grady10086/DORAEMON

Repository files navigation

🔔 DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation

Paper Project Page Citation

📚 Contents

✨ Abstract

Adaptive navigation in unfamiliar environments is crucial for household service robots but remains challenging due to the need for both low-level path planning and high-level scene understanding. While recent vision-language model (VLM) based zero-shot approaches reduce dependence on prior maps and scene-specific training data, they face significant limitations: spatiotemporal discontinuity from discrete observations, unstructured memory representations, and insufficient task understanding leading to navigation failures. We propose DORAEMON (Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation), a novel cognitive-inspired, zero-shot, end-to-end framework consisting of Ventral and Dorsal Streams that mimics human navigation capabilities. The Dorsal Stream implements the Hierarchical Semantic-Spatial Fusion and Topology Map to handle spatiotemporal discontinuities, while the Ventral Stream combines CoDe-VLM and Exec-VLM to improve decision-making. Our approach also develops Nav-Ensurance to ensure navigation safety and efficiency. We evaluate DORAEMON on the HM3Dv1, HM3Dv2, MP3D, where it achieves state-of-the-art performance on both SR and SPL metrics, significantly outperforming existing methods. We also introduce a new evaluation metric (AORI) to assess navigation intelligence better. Comprehensive experiments demonstrate DORAEMON's effectiveness in zero-shot and end-to-end navigation without requiring prior map building or pre-training.

💥 Update

🔥 We've reorganized and cleaned up the repository to ensure a clear, well-structured codebase. Please give the training and inference scripts a try, and feel free to leave an issue if you run into any problems. We apologize for any confusion caused by our original codebase release. 5.15, 2025

🔥 We've released some demos. 5.22, 2025

📺 Demo

🛋️ SOFA Demo1

🟦 TABLE Demo2

🛏️ BED Demo3

🌳 PLANT Demo4

🗄️ CABINET Demo5

💺 CHAIR Demo6

🌳 PLANT Demo7

🛋️ SOFA Demo8

📺 TV Demo9

🚽 TOILET Demo10

🛋️ SOFA Demo11

💺 CHAIR Demo12

🎬Real: an orange cushion that fell on the ground Click to watch the demo for 'orange cushion'

🎬Real: an orange cushion that fell on the ground Click to watch the demo for 'orange cushion'

🚀 Get Started

⚙️ Installation and Setup

  1. Clone this repo.

  2. Create the conda environment and install all dependencies.

    conda create -n doraemon python=3.9 cmake=3.14.0
    conda activate doraemon
    conda install habitat-sim=0.3.1 withbullet headless -c conda-forge -c aihabitat
    pip install -r requirements.txt

🛢 Prepare Dataset

This project is based on Habitat simulator and the HM3D and MP3D datasets are available here. Our code requires all above data to be in a data folder in the following format. Move the downloaded HM3D v0.1, HM3D v0.2 and MP3D folders into the following configuration:

├── <DATASET_ROOT>
│  ├── hm3d_v0.1/
│  │  ├── val/
│  │  │  ├── 00800-TEEsavR23oF/
│  │  │  │  ├── TEEsavR23oF.navmesh
│  │  │  │  ├── TEEsavR23oF.glb
│  │  ├── hm3d_annotated_basis.scene_dataset_config.json
│  ├── objectnav_hm3d_v0.1/
│  │  ├── val/
│  │  │  ├── content/
│  │  │  │  ├──4ok3usBNeis.json.gz
│  │  │  ├── val.json.gz
│  ├── hm3d_v0.2/
│  │  ├── val/
│  │  │  ├── 00800-TEEsavR23oF/
│  │  │  │  ├── TEEsavR23oF.basis.navmesh
│  │  │  │  ├── TEEsavR23oF.basis.glb
│  │  ├── hm3d_annotated_basis.scene_dataset_config.json
│  ├── objectnav_hm3d_v0.2/
│  │  ├── val/
│  │  │  ├── content/
│  │  │  │  ├──4ok3usBNeis.json.gz
│  │  │  ├── val.json.gz
│  ├── mp3d/
│  │  ├── 17DRP5sb8fy/
│  │  │  ├── 17DRP5sb8fy.glb
│  │  │  ├── 17DRP5sb8fy.house
│  │  │  ├── 17DRP5sb8fy.navmesh
│  │  │  ├── 17DRP5sb8fy_semantic.ply
│  │  ├── mmp3d.scene_dataset_config.json
│  ├── objectnav_mp3d/
│  │  ├── val/
│  │  │  ├── content/
│  │  │  │  ├──2azQ1b91cZZ.json.gz
│  │  │  ├── val.json.gz

🔑 Prepare Gemini API

You can set your own GeminiAPI key by export GEMINI_API_KEY=xxx

📈 Evaluation

Run python scripts/main.py to visualize the result of an episode.

To evaluate DORAEMON, we use a framework for parallel evaluation (HM3D v0.1 contains 1000 episodes, 2000 episodes for HM3D v0.2 and 2195 episodes for MP3D). The file parallel_gpu0.sh contains a script to distribute K instances over N GPUs, and for each of them to run M episodes. A local flask server is initialized to handle the data aggregation, and then the aggregated results are logged to wandb. Make sure you are logged in with wandb login.

📖 Citation

If you find our work useful, please cite:

@misc{gu2025doraemondecentralizedontologyawarereliable,
      title={DORAEMON: Decentralized Ontology-aware Reliable Agent with Enhanced Memory Oriented Navigation}, 
      author={Tianjun Gu and Linfeng Li and Xuhong Wang and Chenghua Gong and Jingyu Gong and Zhizhong Zhang and Yuan Xie and Lizhuang Ma and Xin Tan},
      year={2025},
      eprint={2505.21969},
      archivePrefix={arXiv},
      primaryClass={cs.RO},
      url={https://arxiv.org/abs/2505.21969}, 
}

📫 Contact

For questions about this work, please contact:

Tianjun Gu: [email protected]

Project Page: https://grady10086.github.io/DORAEMON/

GitHub stars GitHub forks

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published