This project implements Deep Q-Network (DQN) algorithms for value-based reinforcement learning. The implementation supports both discrete control tasks (CartPole) and Atari games with visual observations (Pong, Breakout, etc.).
- Deep Q-Network (DQN) with experience replay
- Target network for stable training
- Prioritized Experience Replay for improved sample efficiency
- Support for multiple environments: CartPole and Atari games
- Comprehensive evaluation tools with video recording
- Training progress visualization and metrics tracking
- Modular and extensible architecture
- DQN Networks: Separate architectures for CartPole (fully connected) and Atari (convolutional)
- Experience Replay: Standard and prioritized replay buffers
- Preprocessing: Frame stacking and resizing for Atari environments
- Training Loop: Epsilon-greedy exploration with decay
- Evaluation: Performance assessment and video recording
- Deep Q-Network (DQN): Basic value-based RL with neural network function approximation
- Experience Replay: Storing and sampling past experiences for stable learning
- Target Network: Separate network for computing target Q-values
- Prioritized Experience Replay: Sampling important transitions more frequently
-
Clone the repository
git clone <repository-url> cd value-based-reinforcement-learning
-
Install dependencies
pip install -r requirements.txt
-
Install Atari ROMs (for Atari environments)
python -c "import ale_py.roms as roms; roms.verify_install()"
Train a DQN agent for the CartPole environment:
python src/cartpole_dqn.py --num_episodes 1000 --save_dir ./models/cartpole--num_episodes: Number of training episodes (default: 1000)--memory_size: Replay buffer size (default: 10000)--batch_size: Training batch size (default: 32)--learning_rate: Learning rate (default: 1e-3)--epsilon: Initial exploration rate (default: 1.0)--epsilon_decay: Exploration decay rate (default: 0.995)--target_update_frequency: Target network update frequency (default: 100)
Train a DQN agent for Atari Pong:
python src/atari_dqn.py --env_name "ALE/Pong-v5" --num_episodes 2000 --save_dir ./models/pongTrain with prioritized experience replay:
python src/atari_dqn.py --env_name "ALE/Pong-v5" --use_prioritized_replay --num_episodes 2000--env_name: Atari environment name (default: "ALE/Pong-v5")--num_episodes: Number of training episodes (default: 2000)--memory_size: Replay buffer size (default: 100000)--use_prioritized_replay: Enable prioritized experience replay--epsilon_decay: Exploration decay rate (default: 0.99995)--target_update_frequency: Target network update frequency (default: 1000)
Evaluate a trained model:
python src/evaluate_model.py --model_path ./models/cartpole/cartpole_dqn.pt --num_episodes 10Record gameplay video:
python src/evaluate_model.py \
--model_path ./models/pong/ALE_Pong-v5_dqn.pt \
--record_video \
--video_episodes 3 \
--video_path ./results/pong_gameplay.mp4Analyze Q-value distributions:
python src/evaluate_model.py \
--model_path ./models/cartpole/cartpole_dqn.pt \
--analyze_q_values \
--plot_resultsRun the complete demo script:
cd demo && chmod +x demo.sh && ./demo.shvalue-based-reinforcement-learning/
├── README.md
├── requirements.txt
├── src/
│ ├── dqn_core.py # Core DQN components and utilities
│ ├── cartpole_dqn.py # CartPole DQN implementation
│ ├── atari_dqn.py # Atari DQN implementation
│ └── evaluate_model.py # Model evaluation and testing
├── demo/
│ └── demo.sh # Quick demonstration script
├── models/ # Saved model checkpoints
└── results/ # Evaluation results and videos
dqn_core.py: Core DQN components including networks, replay buffers, and utilitiescartpole_dqn.py: Complete DQN implementation for CartPole environmentatari_dqn.py: DQN implementation for Atari games with CNN and preprocessingevaluate_model.py: Comprehensive evaluation tools for trained models
- Environment:
CartPole-v1 - State Space: 4-dimensional continuous (position, velocity, angle, angular velocity)
- Action Space: 2 discrete actions (left, right)
- Goal: Balance pole for 500 timesteps
- Pong:
ALE/Pong-v5 - Breakout:
ALE/Breakout-v5 - Space Invaders:
ALE/SpaceInvaders-v5 - State Space: 210x160x3 RGB images (preprocessed to 84x84 grayscale)
- Action Space: Game-specific discrete actions
- Goal: Maximize game score
- Target Score: 475+ (considered solved)
- Training Episodes: ~300-500 episodes to convergence
- Success Rate: >95% after training
- Target Score: Positive average reward
- Training Episodes: 1000-2000 episodes for basic competency
- Performance: Can learn to hit the ball and score points
For comprehensive experimental analysis, training metrics, hyperparameter tuning results, and detailed technical discussion, see TECHNICAL_REPORT.md.
CartPole DQN:
Linear(4, 256) -> ReLU -> Linear(256, 256) -> ReLU -> Linear(256, 128) -> ReLU -> Linear(128, 2)
Atari DQN:
Conv2d(4, 32, 8, 4) -> ReLU -> Conv2d(32, 64, 4, 2) -> ReLU -> Conv2d(64, 64, 3, 1) -> ReLU
-> Flatten -> Linear(3136, 512) -> ReLU -> Linear(512, num_actions)
CartPole:
- Learning Rate: 1e-3
- Batch Size: 32
- Memory Size: 10,000
- Target Update: Every 100 steps
- Epsilon Decay: 0.995
Atari:
- Learning Rate: 1e-4
- Batch Size: 32
- Memory Size: 100,000
- Target Update: Every 1,000 steps
- Epsilon Decay: 0.99995
- Prioritizes transitions with higher TD-errors
- Implements importance sampling for unbiased learning
- Configurable alpha (prioritization exponent) and beta (importance sampling)
- Frame stacking (4 consecutive frames)
- Grayscale conversion
- Resizing to 84x84 pixels
- Normalization to [0, 1] range
- Episode rewards and lengths
- Success rates
- Q-value analysis
- Training loss curves
- Performance visualization
This implementation can be extended with:
- Double DQN: Reduce overestimation bias
- Dueling DQN: Separate value and advantage estimation
- Noisy Networks: Parameter space exploration
- Rainbow DQN: Combination of multiple improvements
- Multi-step learning: N-step returns
- Distributional RL: Learn value distributions