PPO-SnowballTarget Reinforcement Learning Model

Model Description

This model is a Proximal Policy Optimization (PPO) agent trained to play the SnowballTarget environment from Unity ML-Agents. The agent, named Julien the Bear 🐻, learns to accurately throw snowballs at spawning targets to maximize rewards.

Model Details

Model Architecture

  • Algorithm: Proximal Policy Optimization (PPO)
  • Framework: Unity ML-Agents with PyTorch backend
  • Agent: Julien the Bear (3D character)
  • Policy Network: Actor-Critic architecture
    • Actor: Outputs action probabilities
    • Critic: Estimates state values for advantage calculation

Environment: SnowballTarget

SnowballTarget is an environment created at Hugging Face using assets from Kay Lousberg where you train an agent called Julien the bear 🐻 that learns to hit targets with snowballs.

Environment Details:

  • Objective: Train Julien the Bear to accurately throw snowballs at targets
  • Setting: 3D winter environment with spawning targets
  • Agent: Single agent (Julien the Bear)
  • Targets: Dynamically spawning targets that need to be hit with snowballs

Observation Space

The agent observes:

  • Agent's position and rotation
  • Target positions and states
  • Snowball trajectory information
  • Environmental spatial relationships
  • Ray-cast sensors for spatial awareness

Action Space

  • Continuous Actions: Aiming direction and throw force
  • Action Dimensions: Typically 2-3 continuous values
    • Horizontal aiming angle
    • Vertical aiming angle
    • Throw force/power

Reward Structure

  • Positive Rewards:
    • +1.0 for hitting a target
    • Distance-based reward bonuses for accurate shots
  • Negative Rewards:
    • Small time penalty to encourage efficiency
    • Penalty for missing targets

Training Configuration

PPO Hyperparameters

  • Algorithm: Proximal Policy Optimization (PPO)
  • Training Framework: Unity ML-Agents
  • Batch Size: Typical ML-Agents default (1024-2048)
  • Learning Rate: Adaptive (typically 3e-4)
  • Entropy Coefficient: Encourages exploration
  • Value Function Coefficient: Balances actor-critic training
  • PPO Clipping: Ξ΅ = 0.2 (standard PPO clipping range)

Training Process

  • Environment: Unity ML-Agents SnowballTarget
  • Training Method: Parallel environment instances
  • Episode Length: Variable (until all targets hit or timeout)
  • Success Criteria: Consistent target hitting accuracy

Performance Metrics

The model is evaluated based on:

  • Hit Accuracy: Percentage of targets successfully hit
  • Average Reward: Cumulative reward per episode
  • Training Stability: Consistent improvement over training steps
  • Efficiency: Time to hit targets (faster is better)

Expected Performance

  • Target Hit Rate: >80% accuracy on target hitting
  • Convergence: Stable policy after sufficient training episodes
  • Generalization: Ability to hit targets in various positions

Usage

Loading the Model

from mlagents_envs import UnityToPythonWrapper
from mlagents_envs.side_channel.engine_configuration_channel import EngineConfigurationChannel

# Load the trained model
# Model files should include .onnx policy file and configuration

Resume the training

mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume

Running Inference

# The model can be used directly in Unity ML-Agents environments
# or deployed to Unity builds for real-time inference

Technical Implementation

PPO Algorithm Features

  • Policy Clipping: Prevents large policy updates
  • Advantage Estimation: GAE (Generalized Advantage Estimation)
  • Value Function: Shared network with actor for efficiency
  • Batch Training: Multiple parallel environments for sample efficiency

Unity ML-Agents Integration

  • Python API: Training through Python interface
  • Unity Side: Real-time environment simulation
  • Observation Collection: Automated sensor data gathering
  • Action Execution: Smooth character animation and physics

Files Structure

β”œβ”€β”€ SnowballTarget.onnx          # Trained policy network
β”œβ”€β”€ configuration.yaml          # Training configuration
β”œβ”€β”€ run_logs/                   # Training metrics and logs
└── results/                    # Training results and statistics

Limitations and Considerations

  1. Environment Specific: Model is trained specifically for SnowballTarget environment
  2. Unity Dependency: Requires Unity ML-Agents framework for deployment
  3. Physics Sensitivity: Performance may vary with different physics settings
  4. Target Patterns: May not generalize to significantly different target spawn patterns

Applications

  • Game AI: Can be integrated into Unity games as intelligent NPC behavior
  • Educational: Demonstrates reinforcement learning in 3D environments
  • Research: Benchmark for continuous control and aiming tasks
  • Interactive Demos: Can be deployed in web builds for demonstrations

Ethical Considerations

This model represents a benign gaming scenario with no ethical concerns:

  • Content: Family-friendly winter sports theme
  • Violence: Non-violent snowball throwing activity
  • Educational Value: Suitable for learning about AI and reinforcement learning

Unity ML-Agents Version Compatibility

  • ML-Agents: Compatible with Unity ML-Agents toolkit
  • Unity Version: Works with Unity 2021.3+ LTS
  • Python Package: Requires mlagents Python package

Training Environment

  • Unity Editor: 3D environment simulation
  • ML-Agents: Python training interface
  • Hardware: GPU-accelerated training recommended
  • Parallel Environments: Multiple instances for efficient training

Citation

If you use this model, please cite:

@misc{ppo-snowballtarget-2024,
  title={PPO-SnowballTarget: Reinforcement Learning Agent for Unity ML-Agents},
  author={Adilbai},
  year={2024},
  publisher={Hugging Face Hub},
  url={https://huggingface.co/Adilbai/ppo-SnowballTarget}
}

References

Downloads last month
32
Video Preview
loading