PPO-SnowballTarget Reinforcement Learning Model
Model Description
This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear π», learns to accurately throw snowballs at spawning targets to maximize rewards.
Model Details
Model Architecture
- Algorithm: Proximal Policy Optimization (PPO)
- Framework: Unity ML-Agents with PyTorch backend
- Agent: Julien the Bear (3D character)
- Policy Network: Actor-Critic architecture
- Actor: Outputs action probabilities
- Critic: Estimates state values for advantage calculation
Environment: SnowballTarget
SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear π» that learns to hit targets with snowballs.
Environment Details:
- Objective: Train Julien the Bear to accurately throw snowballs at targets
- Setting: 3D winter environment with spawning targets
- Agent: Single agent (Julien the Bear)
- Targets: Dynamically spawning targets that need to be hit with snowballs
Observation Space
The agent observes:
- Agent's position and rotation
- Target positions and states
- Snowball trajectory information
- Environmental spatial relationships
- Ray-cast sensors for spatial awareness
Action Space
- Continuous Actions: Aiming direction and throw force
- Action Dimensions: Typically 2-3 continuous values
- Horizontal aiming angle
- Vertical aiming angle
- Throw force/power
Reward Structure
- Positive Rewards:
- +1.0 for hitting a target
- Distance-based reward bonuses for accurate shots
- Negative Rewards:
- Small time penalty to encourage efficiency
- Penalty for missing targets
Training Configuration
PPO Hyperparameters
- Algorithm: Proximal Policy Optimization (PPO)
- Training Framework: Unity ML-Agents
- Batch Size: Typical ML-Agents default (1024-2048)
- Learning Rate: Adaptive (typically 3e-4)
- Entropy Coefficient: Encourages exploration
- Value Function Coefficient: Balances actor-critic training
- PPO Clipping: Ξ΅ = 0.2 (standard PPO clipping range)
Training Process
- Environment: Unity ML-Agents SnowballTarget
- Training Method: Parallel environment instances
- Episode Length: Variable (until all targets hit or timeout)
- Success Criteria: Consistent target hitting accuracy
Performance Metrics
The model is evaluated based on:
- Hit Accuracy: Percentage of targets successfully hit
- Average Reward: Cumulative reward per episode
- Training Stability: Consistent improvement over training steps
- Efficiency: Time to hit targets (faster is better)
Expected Performance
- Target Hit Rate: >80% accuracy on target hitting
- Convergence: Stable policy after sufficient training episodes
- Generalization: Ability to hit targets in various positions
Usage
Loading the Model
from mlagents_envs import UnityToPythonWrapper
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel
# Load the trained model
# Model files should include .onnx policy file and configuration
Resume the training
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
Running Inference
# The model can be used directly in Unity ML-Agents environments
# or deployed to Unity builds for real-time inference
Technical Implementation
PPO Algorithm Features
- Policy Clipping: Prevents large policy updates
- Advantage Estimation: GAE (Generalized Advantage Estimation)
- Value Function: Shared network with actor for efficiency
- Batch Training: Multiple parallel environments for sample efficiency
Unity ML-Agents Integration
- Python API: Training through Python interface
- Unity Side: Real-time environment simulation
- Observation Collection: Automated sensor data gathering
- Action Execution: Smooth character animation and physics
Files Structure
βββ SnowballTarget.onnx # Trained policy network
βββ configuration.yaml # Training configuration
βββ run_logs/ # Training metrics and logs
βββ results/ # Training results and statistics
Limitations and Considerations
- Environment Specific: Model is trained specifically for SnowballTarget environment
- Unity Dependency: Requires Unity ML-Agents framework for deployment
- Physics Sensitivity: Performance may vary with different physics settings
- Target Patterns: May not generalize to significantly different target spawn patterns
Applications
- Game AI: Can be integrated into Unity games as intelligent NPC behavior
- Educational: Demonstrates reinforcement learning in 3D environments
- Research: Benchmark for continuous control and aiming tasks
- Interactive Demos: Can be deployed in web builds for demonstrations
Ethical Considerations
This model represents a benign gaming scenario with no ethical concerns:
- Content: Family-friendly winter sports theme
- Violence: Non-violent snowball throwing activity
- Educational Value: Suitable for learning about AI and reinforcement learning
Unity ML-Agents Version Compatibility
- ML-Agents: Compatible with Unity ML-Agents toolkit
- Unity Version: Works with Unity 2021.3+ LTS
- Python Package: Requires
mlagentsPython package
Training Environment
- Unity Editor: 3D environment simulation
- ML-Agents: Python training interface
- Hardware: GPU-accelerated training recommended
- Parallel Environments: Multiple instances for efficient training
Citation
If you use this model, please cite:
@misc{ppo-snowballtarget-2024,
title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents},
author={Adilbai},
year={2024},
publisher={Hugging Face Hub},
url={https://huggingface.co/Adilbai/ppo-SnowballTarget}
}
References
- Schulman, J., et al. (2017). Proximal Policy Optimization Algorithms. arXiv preprint arXiv:1707.06347.
- Unity Technologies. Unity ML-Agents Toolkit. https://github.com/Unity-Technologies/ml-agents
- Hugging Face Deep RL Course: https://huggingface.co/learn/deep-rl-course
- Kay Lousberg (Environment Assets): https://www.kaylousberg.com/
- Downloads last month
- 32