ainz
/

tiny-recursive-model

@@ -1,199 +1,189 @@
 ---
 library_name: transformers
-tags: []
 ---
-# Model Card for Model ID
-<!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
-### Model Description
-<!-- Provide a longer summary of what this model is. -->
-This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
-- **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
-### Model Sources [optional]
-<!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
-## Uses
-<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
-### Direct Use
-<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
-[More Information Needed]
-### Downstream Use [optional]
-<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
-[More Information Needed]
-### Out-of-Scope Use
-<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
-[More Information Needed]
-## Bias, Risks, and Limitations
-<!-- This section is meant to convey both technical and sociotechnical limitations. -->
-[More Information Needed]
-### Recommendations
-<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
-Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
-## How to Get Started with the Model
-Use the code below to get started with the model.
-[More Information Needed]
-## Training Details
-### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
-### Training Procedure
-<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
-#### Preprocessing [optional]
-[More Information Needed]
-#### Training Hyperparameters
-- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
-#### Speeds, Sizes, Times [optional]
-<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
-[More Information Needed]
-## Evaluation
-<!-- This section describes the evaluation protocols and provides the results. -->
-### Testing Data, Factors & Metrics
-#### Testing Data
-<!-- This should link to a Dataset Card if possible. -->
-[More Information Needed]
-#### Factors
-<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
-[More Information Needed]
-#### Metrics
-<!-- These are the evaluation metrics being used, ideally with a description of why. -->
-[More Information Needed]
-### Results
-[More Information Needed]
-#### Summary
-## Model Examination [optional]
-<!-- Relevant interpretability work for the model goes here -->
-[More Information Needed]
-## Environmental Impact
-<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
-Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
-- **Hardware Type:** [More Information Needed]
-- **Hours used:** [More Information Needed]
-- **Cloud Provider:** [More Information Needed]
-- **Compute Region:** [More Information Needed]
-- **Carbon Emitted:** [More Information Needed]
-## Technical Specifications [optional]
-### Model Architecture and Objective
-[More Information Needed]
-### Compute Infrastructure
-[More Information Needed]
-#### Hardware
-[More Information Needed]
-#### Software
-[More Information Needed]
-## Citation [optional]
-<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
-**BibTeX:**
-[More Information Needed]
-**APA:**
-[More Information Needed]
-## Glossary [optional]
-<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
-[More Information Needed]
-## More Information [optional]
-[More Information Needed]
-## Model Card Authors [optional]
-[More Information Needed]
-## Model Card Contact
-[More Information Needed]

 ---
 library_name: transformers
+license: apache-2.0
+datasets:
+- roneneldan/TinyStories
+language:
+- en
 ---
+# Tiny Recursive Model (TRM)
+A compact language model featuring a recursive architecture designed for efficient text generation. This model uses a custom `TinyRecursiveModel` class with a ~7M parameter logic core [1].
 ## Model Details
+- **Model Type**: Causal Language Model with Custom Recursive Architecture
+- **Parameters**: ~40.21M total parameters (7.39M logic core, 32.82M vocabulary)
+- **Architecture**: 3 physical layers, 8 recursive loops, 8 attention heads [1]
+- **Vocabulary Size**: 50,257 tokens
+- **Context Length**: 1024 tokens
+- **Embedding Dimension**: 512
+## ⚠️ Important: Custom Model Class
+This model uses a **custom `TinyRecursiveModel` class** that is not part of the standard transformers library [1]. You must use `trust_remote_code=True` when loading the model.
+## Installation Requirements
+```bash
+pip install transformers torch
+```
+## Usage
+### Method 1: Using trust_remote_code (Recommended)
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+import torch
+# Load the model and tokenizer (MUST use trust_remote_code=True)
+model_name = "ainz/tiny-recursive-model"
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    trust_remote_code=True  # Required for custom model class
+)
+# Generate text
+input_text = "Once upon a time"
+inputs = tokenizer(input_text, return_tensors="pt")
+with torch.no_grad():
+    outputs = model.generate(
+        inputs["input_ids"],
+        max_length=100,
+        do_sample=True,
+        temperature=0.7,
+        top_p=0.9,
+        pad_token_id=tokenizer.eos_token_id
+    )
+generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
+print(generated_text)
+```
+### Method 2: Manual Class Loading
+If you prefer not to use `trust_remote_code`, you can manually download and use the model files:
+```python
+import torch
+from huggingface_hub import hf_hub_download
+# Download the model files
+model_path = hf_hub_download(repo_id="ainz/tiny-recursive-model", filename="pytorch_model.bin")
+config_path = hf_hub_download(repo_id="ainz/tiny-recursive-model", filename="config.json")
+# You'll need to copy the TinyRecursiveModel class definition locally
+# Then load manually:
+# model = TinyRecursiveModel.from_pretrained("ainz/tiny-recursive-model")
+```
+### Batch Generation Example
+```python
+from transformers import AutoTokenizer, AutoModelForCausalLM
+# Load model with trust_remote_code
+tokenizer = AutoTokenizer.from_pretrained("ainz/tiny-recursive-model")
+model = AutoModelForCausalLM.from_pretrained(
+    "ainz/tiny-recursive-model",
+    trust_remote_code=True
+)
+# Generate for multiple prompts
+prompts = [
+    "The future of artificial intelligence",
+    "In a distant galaxy",
+    "The secret to happiness"
+]
+inputs = tokenizer(prompts, return_tensors="pt", padding=True, truncation=True)
+with torch.no_grad():
+    outputs = model.generate(
+        inputs["input_ids"],
+        attention_mask=inputs["attention_mask"],
+        max_length=80,
+        do_sample=True,
+        temperature=0.7,
+        pad_token_id=tokenizer.eos_token_id
+    )
+for i, output in enumerate(outputs):
+    text = tokenizer.decode(output, skip_special_tokens=True)
+    print(f"Prompt {i+1}: {text}\n")
+```
+### Advanced Generation Parameters
+```python
+# More creative generation
+outputs = model.generate(
+    inputs["input_ids"],
+    max_length=150,
+    do_sample=True,
+    temperature=0.8,        # Higher = more creative
+    top_k=50,              # Consider top 50 tokens
+    top_p=0.95,            # Nucleus sampling
+    repetition_penalty=1.1, # Reduce repetition
+    pad_token_id=tokenizer.eos_token_id
+)
+# Deterministic generation
+outputs = model.generate(
+    inputs["input_ids"],
+    max_length=100,
+    do_sample=False,       # Greedy decoding
+    pad_token_id=tokenizer.eos_token_id
+)
+```
+## Architecture Overview
+This model implements a novel recursive architecture where layers are reused multiple times through loops [1]. Key features:
+- **Recursive Layers**: 3 physical transformer layers recursively applied 8 times
+- **Parameter Efficiency**: Achieves 7.39M logic parameters through recursive design
+- **Custom Implementation**: Uses `TinyRecursiveModel` class with `TRMConfig`
+## Model Performance
+Training completed with:
+- **Final Training Loss**: ~2.0
+- **Training Steps**: 7,032 (1 epoch)
+- **Parameter Breakdown**: 7.39M logic core + 32.82M vocabulary
+## Security Note
+This model requires `trust_remote_code=True` because it uses custom model architecture code. Only use this if you trust the model source.
+## Troubleshooting
+**Error loading model?**
+- Make sure you're using `trust_remote_code=True`
+- Ensure you have the latest transformers version: `pip install --upgrade transformers`
+**Generation issues?**
+- The model is relatively small (7.39M logic parameters) - adjust temperature and sampling parameters
+- Try different prompt formats for better results
+## Limitations
+- Small model size (~7M logic parameters) may limit performance compared to larger models
+- Custom architecture requires `trust_remote_code=True`
+- Best suited for creative writing and simple text completion tasks
+## Citation
+```bibtex
+@model{tiny_recursive_model_2024,
+  author = {ainz},
+  title = {Tiny Recursive Model},
+  year = {2025},
+  publisher = {Hugging Face},
+  url = {https://huggingface.co/ainz/tiny-recursive-model}
+}
+```