Google TPUs documentation
Find More Examples on the Optimum-TPU GitHub Repository
Optimum-TPU
🤗 Optimum-TPUSupported ModelsInstallationOptimum TPU Containers
Tutorials
First TPU Setup on Google CloudFirst TPU Inference on Google CloudFirst TPU Training on Google Cloud
How-To Guides
Deploying and Connecting to Google TPU Instances via GCloud CLIDeploying a TGI server on a Google Cloud TPU instanceTraining on a Google Cloud TPU instanceHow to Deploy a Model on Inference Endpoint for Serving using TPUsAdvanced TGI Server ConfigurationInstalling Optimum-TPU inside a Docker ContainerGemma Fine-Tuning ExampleLlama Fine-Tuning ExampleFind More Examples on the Optimum-TPU GitHub Repository
Conceptual Guides
Reference
Contributing
Find More Examples on the Optimum-TPU GitHub Repository
To find the latest examples, visit the examples folder in the optimum-tpu repo on github
Text Generation
Learn how to perform efficient inference for text generation tasks:
- Basic Generation Script (examples/text-generation/generation.py)
- Demonstrates text generation using models like Gemma and Mistral
- Features greedy sampling implementation
- Shows how to use static caching for improved performance
- Includes performance measurement and timing analysis
- Supports custom model loading and configuration
Language Model Fine-tuning
Explore how to fine-tune language models on TPU infrastructure:
- Interactive Gemma Tutorial (view in the docs)
- Complete notebook showing Gemma fine-tuning process
- Covers environment setup and TPU configuration
- Demonstrates FSDPv2 integration for efficient model sharding
- Includes dataset preparation and PEFT/LoRA implementation
- Provides step-by-step training workflow
The full notebook is available at examples/language-modeling/gemma_tuning.ipynb
- LLaMA Fine-tuning Guide (view in the docs)
- Detailed guide for fine-tuning LLaMA-2 and LLaMA-3 models
- Explains SPMD and FSDP concepts
- Shows how to implement efficient data parallel training
- Includes practical code examples and prerequisites
The full notebook is available at examples/language-modeling/llama_tuning.ipynb
Additional Resources
- Visit the Optimum-TPU GitHub repository for more details
- Explore the Google Cloud TPU documentation for deeper understanding of TPU architecture
To contribute to these examples, visit our GitHub repository.