Official implementation of "Online Robust Reinforcement Learning Through Monte-Carlo Planning" (ICML 2025).
This repository provides a robust variant of Monte Carlo Tree Search (MCTS) that addresses model ambiguity in both transition dynamics and reward distributions. Our algorithm bridges the gap between simulation-based planning and real-world deployment by incorporating distributionally robust optimization into the MCTS framework.
Key Features:
- Robust MCTS with multiple ambiguity sets (Total Variation, Chi-squared, Wasserstein)
- Non-asymptotic convergence guarantees matching standard MCTS (O(n^(-1/2)))
- Improved performance under model misspecification
- Implementation for Gambler's Problem environment
# Create conda environment
conda create -n rmcts python=3.9
conda activate rmcts
# Clone the repository
git clone https://github.com/brahimdriss/RobustMCTS.git
cd RobustMCTS
# Clone and install the rl-agents dependency
git clone https://github.com/brahimdriss/rl-agents.git
cd rl-agents
pip install -e .
cd ..
# Install required packages
pip install gym==0.26.2.
├── agents/ # MCTS agent implementations
│ ├── power_mcts/ # Standard Power-UCT agent
│ └── robust_mcts/ # Robust MCTS variants
├── environments/ # Environment implementations
│ └── gambler.py # Gambler's Problem
├── experiments/ # Experiment scripts
│ └── run_comparaison.py # Main comparison experiments
├── utils/ # Utility functions
│ ├── parallel.py # Parallel execution utilities
│ └── visualization.py # Plotting and visualization
└── tests/ # Unit tests
from environments.gambler import GamblerEnv
from agents.robust_mcts import RobustMCTSAgent
# Create environment
env = GamblerEnv(p_h=0.4, goal=10)
# Configure robust agent
config = {
"c": 2,
"gamma": 0.99,
"power": 2,
"budget": 2000,
"max_depth": 50,
"uncertainty_type": "wasserstein", # or "tv", "chi2"
"uncertainty_budget": 0.5
}
# Initialize agent
agent = RobustMCTSAgent(env, config)
# Get action
state, _ = env.reset()
action = agent.act(state)The repository includes experiments on the Gambler's Problem, testing robustness to probability misspecification across different planning and execution environments. To run the comparison experiments:
python -m experiments.run_comparaisonResults are saved to the res/ directory with visualizations showing success rates under different levels of model mismatch.
@inproceedings{dam2025online,
title={Online Robust Reinforcement Learning Through Monte-Carlo Planning},
author={Tuan Quang Dam and Kishan Panaganti and Brahim Driss and Adam Wierman},
booktitle={Forty-second International Conference on Machine Learning},
year={2025}
}- Authors: Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman
For questions or issues, please open an issue on GitHub or contact the authors.