| --- |
| library_name: transformers |
| language: |
| - en |
| tags: |
| - reasoning |
| - implicit-reasoning |
| - chain-of-thought |
| - llama |
| - asterisk |
| - aspp |
| - pi-flow |
| - deep-reasoning |
| license: apache-2.0 |
| base_model: meta-llama/Llama-3.2-1B-Instruct |
| model_name: Geilim-1B-Instruct |
| datasets: |
| - gsm8k |
| - hellaswag |
| - ai2_arc |
| pipeline_tag: text-generation |
| inference: true |
| --- |
| |
| # Geilim-1B-Instruct (εΏε») |
|
|
| > **Deep Causal Internal Reasoning** |
| > No verbose CoT, no `<think>` tags, just concise answers powered by implicit reasoning. |
|
|
| --- |
|
|
| ## π‘ Introduction |
|
|
| Recent advances in reasoning models (DeepSeek R1, o1) have demonstrated impressive capabilities through Chain-of-Thought (CoT) reasoning. However, we observe several critical drawbacks: |
|
|
| **Problems with External CoT:** |
| 1. **Verbosity Tax**: Models generate hundreds of tokens in `<think>` tags before answering, increasing latency and cost |
| 2. **Autoregressive Dependency**: Models must "see" their reasoning to follow it, forcing sequential token generation |
| 3. **Token Inefficiency**: Users pay for reasoning traces they often don't need, only the final answer matters |
| 4. **Production Overhead**: Verbose outputs are impractical for real-time APIs and edge deployment |
|
|
| **Our Insight**: What if reasoning could happen *internally* in the model's hidden states, without generating verbose traces? |
|
|
| **Geilim-1B-Instruct** addresses these limitations through a hybrid architecture combining: |
| - **ASPP (Adjacency-Structured Parallel Propagation)**: Graph-based causal chains for structured reasoning |
| - **Ο-flow (Probability Flow Dynamics)**: Internal refinement in probability space without token generation |
| - **Hybrid Gating**: Learnable balance between structured and attention-based processing |
|
|
| The result: Deep reasoning capability with concise outputs - the best of both worlds. |
|
|
| --- |
|
|
| ## π― Core Value Proposition |
|
|
| **Geilim-1B-Instruct is the anti-verbose reasoning model.** |
|
|
| | Model Type | Reasoning Approach | Output Style | |
| |------------|-------------------|--------------| |
| | **Baseline** (Llama-3.2-1B) | Limited reasoning | Direct but may lack depth | |
| | **CoT Models** (DeepSeek R1, o1) | External reasoning chains | Verbose `<think>` tags, long outputs | |
| | **Geilim-1B-Instruct** | **Internal reasoning** | **Concise answers, reasoning in hidden states** | |
|
|
| **Key Differentiator**: Geilim performs deep causal reasoning **internally** through ASPP+Ο-flow architecture, then outputs only the final answer. You get the reasoning quality without the verbosity tax. |
|
|
| --- |
|
|
| ## ποΈ Architecture Overview |
|
|
| Geilim-1B-Instruct combines three key components for implicit reasoning: |
|
|
| ### 1. **ASPP Operator** (Adjacency-Structured Parallel Propagation) |
| - **Union-Find graph structure**: Linear causal chain where each token only connects to its parent |
| - **Iterative message passing**: `h_i^(t+1) = Ο(h_i^(t), h_parent[i])` |
| - **K-step evolution**: Adaptive 2-8 steps of causal propagation |
| - **Complexity**: O(n) - efficient linear-time reasoning |
|
|
| **Why it matters**: ASPP creates explicit causal relationships between tokens, allowing information to flow through a reasoning chain without generating output tokens. |
|
|
| ### 2. **Ο-flow** (Probability Flow Dynamics) |
| - **Velocity field learning**: `h' = h + Ξ± * v(h)` where `v(h)` is a learned refinement |
| - **Multi-step refinement**: Iterates in probability space to converge on the correct answer |
| - **Gated application**: Model learns when to refine (complex questions) vs when to skip (simple questions) |
| - **Internal convergence**: Reasoning happens in hidden states, not in generated text |
|
|
| **Why it matters**: Ο-flow eliminates the need for external CoT by performing iterative refinement internally. The model "thinks" in its hidden states and outputs only the final result. |
|
|
| ### 3. **Hybrid Gating Mechanism** |
| ``` |
| output = gate * ASPP(x) + (1-gate) * Attention(x) |
| ``` |
| - Combines structured causal reasoning (ASPP) with flexible attention |
| - Learnable balance between graph-based and sequence-based processing |
| - Applied to all 30 layers of the base model (Llama-3.2-1B) |
|
|
| --- |
|
|
| ## π§ Why Ο-flow Eliminates Verbosity |
|
|
| ### The Problem with Traditional CoT |
|
|
| **External Reasoning Models** (DeepSeek R1, o1-style): |
| ``` |
| User: What is 15 * 8? |
| |
| Model: <think> |
| Let me break this down step by step: |
| 1. First, I'll multiply 15 by 8 |
| 2. 15 * 8 = 15 * (10 - 2) |
| 3. Using distributive property: 15*10 - 15*2 |
| 4. 150 - 30 = 120 |
| Therefore, the answer is 120. |
| </think> |
| |
| The answer is 120. |
| ``` |
| - **Output**: 250+ characters |
| - **Latency**: High (many tokens to generate) |
| - **Cost**: Expensive (charged per token) |
|
|
| ### Geilim's Internal Reasoning |
|
|
| **Geilim-1B-Instruct** (ASPP+Ο-flow): |
| ``` |
| User: What is 15 * 8? |
| |
| Model: 120 |
| ``` |
| - **Output**: 3 characters |
| - **Latency**: Low (minimal generation) |
| - **Cost**: Minimal |
| - **Reasoning**: Happened internally through: |
| 1. ASPP causal chain propagating arithmetic relationships |
| 2. Ο-flow refining probability distribution across answer space |
| 3. Convergence to correct answer in hidden states |
|
|
| --- |
|
|
| ## π¬ Technical Mechanism |
|
|
| ### How Ο-flow Achieves Internal Reasoning |
|
|
| 1. **Probability Space Operations** |
| - Instead of generating tokens to explore answers, Ο-flow refines probability distributions directly |
| - `v(h)`: Learned velocity field that corrects the model's initial judgment |
| - Multi-step: `h^(0) β h^(1) β h^(2)` (2 refinement steps) |
|
|
| 2. **Convergence Without Output** |
| - Traditional models need to "see" their reasoning to follow it (autoregressive dependency) |
| - Ο-flow breaks this: reasoning occurs in parallel across all positions simultaneously |
| - The model converges internally before generating any output token |
|
|
| 3. **Adaptive Complexity** |
| - `pi_flow_use_gate=True`: Model learns when refinement is needed |
| - Simple questions: Direct output (gate β 0, skip refinement) |
| - Complex questions: Internal multi-step refinement (gate β 1, apply Ο-flow) |
| - User always sees concise output regardless |
|
|
| 4. **Synergy with ASPP** |
| - ASPP provides causal structure (parent-child dependencies) |
| - Ο-flow refines along these dependencies |
| - **Result**: Structured reasoning (not just attention) + probabilistic convergence = deep causal understanding |
|
|
| --- |
|
|
| ## π Configuration |
|
|
| ### Model Architecture |
| - **Base Model**: Llama-3.2-1B-Instruct (1.26B params) |
| - **Total Parameters**: ~1.4B (140M additional ASPP+Ο-flow params) |
| - **Hybrid Layers**: All 30 layers (universal reasoning capability) |
|
|
| ### ASPP Settings |
| ```python |
| aspp_hidden_dim: 512 # vs 2048 model hidden_size (reduce overfitting) |
| aspp_num_steps: 2-8 # learnable via sigmoid gating |
| aspp_dropout: 0.15 |
| aspp_num_neighbors: 1 # Union-Find: parent-only connections |
| ``` |
|
|
| ### Ο-flow Settings |
| ```python |
| pi_flow: True # Enable probability flow refinement |
| pi_flow_steps: 2 # 2-step refinement |
| pi_flow_scale: 0.5 # Moderate refinement strength |
| pi_flow_use_gate: True # Adaptive gating |
| ``` |
|
|
| --- |
|
|
| ## π Quick Start |
|
|
| ### Installation |
| ```bash |
| pip install transformers torch |
| ``` |
|
|
| ### Basic Usage |
| ```python |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| import torch |
| |
| # Load model |
| model_path = "NoesisLab/Geilim-1B-Instruct" |
| tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained( |
| model_path, |
| trust_remote_code=True, |
| torch_dtype=torch.bfloat16, |
| device_map="auto", |
| ) |
| |
| # Generate response |
| prompt = "A store has 120 apples. They sell 35 in the morning and 48 in the afternoon. How many are left?" |
| messages = [{"role": "user", "content": prompt}] |
| |
| input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) |
| inputs = tokenizer(input_text, return_tensors="pt").to(model.device) |
| |
| outputs = model.generate( |
| **inputs, |
| max_new_tokens=128, |
| temperature=0.7, |
| do_sample=True, |
| top_p=0.9, |
| ) |
| |
| response = tokenizer.decode(outputs[0][inputs['input_ids'].shape[1]:], skip_special_tokens=True) |
| print(response) # Expected: "37" or "37 apples are left." (concise!) |
| ``` |
|
|
| ### Advanced Usage |
| ```python |
| # For math problems requiring step-by-step (if needed) |
| # Note: Geilim prefers concise outputs, but can show work if prompted |
| prompt = "Explain how you would solve: What is 15 * 23?" |
| |
| # For best results with implicit reasoning |
| generation_config = { |
| "max_new_tokens": 128, # Keep low to encourage conciseness |
| "temperature": 0.7, # Moderate sampling |
| "do_sample": True, |
| "top_p": 0.9, |
| "repetition_penalty": 1.1, # Prevent loops |
| } |
| ``` |
|
|
| --- |
|
|
| ## π Training Details |
|
|
| ### Dataset |
| - **Mixed-Benchmark-Dataset** (composite reasoning benchmarks) |
| - 25% GSM8K (math reasoning) |
| - 30% HellaSwag (commonsense) |
| - 20% ARC (science QA) |
| - 10% OpenHermes (high-quality responses) |
| - 15% Capybara (multi-turn conversations) |
|
|
| ### Training Configuration |
| - **Framework**: TRL SFTTrainer with packing |
| - **Epochs**: 2 |
| - **Batch Size**: Effective 8 (per_device=2, grad_accum=4) |
| - **Learning Rate**: 2e-4 with 10% warmup |
| - **Precision**: bfloat16 with gradient checkpointing |
| - **Optimizer**: AdamW (weight_decay=0.1, max_grad_norm=1.0) |
| |
| ### Training Philosophy |
| Unlike CoT models trained on verbose reasoning chains, Geilim is trained on **answer-focused data** where: |
| - Correct answers are rewarded |
| - Reasoning quality is learned implicitly through ASPP+Ο-flow gradients |
| - The model learns to converge internally rather than generate external reasoning |
| |
| --- |
| |
| ## π Evaluation |
| |
| ### Reasoning Quality Tests |
| Geilim is evaluated on: |
| 1. **Math reasoning** (GSM8K-style arithmetic) |
| 2. **Commonsense reasoning** (HellaSwag, PIQA) |
| 3. **Logic puzzles** (multi-hop deduction) |
| 4. **Reading comprehension** (information tracking) |
| 5. **Causal reasoning** (cause-effect relationships) |
| |
| ### Key Metrics |
| - **Answer correctness** (primary goal) |
| - **Response conciseness** (< 150 chars = concise) |
| - **Reasoning traces** (should be absent from output, present in hidden states) |
| |
| --- |
| |
| ## π― Use Cases |
| |
| ### Ideal For: |
| - **Production APIs**: Low latency, low token cost |
| - **Real-time applications**: Minimal generation overhead |
| - **Cost-sensitive deployments**: Pay only for the answer, not the reasoning |
| - **User-facing chat**: Clean outputs without technical reasoning traces |
| - **Mobile/edge devices**: Smaller token budgets |
| |
| ### Not Ideal For: |
| - **Educational use cases**: When you want to show reasoning steps to users |
| - **Debugging/verification**: When explicit reasoning helps validate answers |
| - **Research**: When analyzing reasoning chains is the goal |
| |
| --- |
| |
| ## π Comparison Table |
| |
| | Feature | Geilim-1B-Instruct | DeepSeek R1 | Llama-3.2-1B | |
| |---------|-----------|-------------|--------------| |
| | **Model Size** | 1.4B | 1.5B | 1.26B | |
| | **Reasoning Type** | Internal (ASPP+Ο-flow) | External (CoT) | Limited | |
| | **Output Style** | Concise answers | Verbose `<think>` tags | Direct answers | |
| | **Latency** | Low | High (many tokens) | Low | |
| | **Cost per query** | Low | High | Low | |
| | **Reasoning depth** | Deep (hidden states) | Deep (explicit) | Shallow | |
| | **Token efficiency** | High | Low | Medium | |
| |
| --- |
| |
| ## π Technical References |
| |
| ### Core Papers & Concepts |
| - **Union-Find Data Structure**: Parent-only connections for efficient causal propagation |
| - **Probability Flow ODEs**: Continuous refinement in probability space (inspired by diffusion models) |
| - **Hybrid Architectures**: Combining structured (graph) and unstructured (attention) reasoning |
| |
| ### Related Work |
| - DeepSeek R1: External reasoning chains |
| - o1 series: Long-form CoT reasoning |
| - SmolLM2: Efficient small language models |
| - Graph Neural Networks: Structured message passing |
| |
| --- |
| |
| ## π§ Development |
| |
| ### Custom Model Registration |
| - **Model type**: `asterisk` (registered with HuggingFace AutoModel) |
| - **Config class**: `AsteriskConfig` (extends LlamaConfig) |
| - **Model class**: `AsteriskForCausalLM` (extends LlamaForCausalLM) |
| - **Loading**: Requires `trust_remote_code=True` |
| |
| |
| --- |
| |
| ## π Key Takeaways |
| |
| 1. **No verbose CoT**: Geilim performs reasoning internally, outputs concisely |
| 2. **ASPP+Ο-flow**: Causal graph structure + probability flow refinement |
| 3. **Deep causal understanding**: Reasoning happens in hidden states, not generated text |
| 4. **Production-ready**: Low latency, low cost, clean outputs |
| 5. **Same reasoning depth**: Matches CoT models without the verbosity |
| |
| --- |
| |
| ## π Citation |
| |
| If you use Geilim-1B-Instruct in your research or applications, please cite: |
| |
| ```bibtex |
| @misc{geilim2026, |
| title={Geilim-1B-Instruct: Deep Causal Internal Reasoning via ASPP and Probability Flow}, |
| author={NoesisLab}, |
| year={2026}, |
| howpublished={HuggingFace Model Hub}, |
| url={https://huggingface.co/NoesisLab/Geilim-1B-Instruct} |
| } |
| ``` |
| |
| --- |
| |
| ## π€ Acknowledgments |
| |
| - **Base Model**: Llama-3.2-1B-Instruct by Meta |
| - **Training Framework**: TRL by HuggingFace |
| - **Inspiration**: DeepSeek R1 (for demonstrating value of reasoning), but pursuing conciseness |
| |
| --- |
| |
| ## π License |
| |
| Llama 3.2 Community License |
| |
| --- |
| |
| ## π Links |
| |
| - **Model Hub**: https://huggingface.co/NoesisLab/Geilim-1B-Instruct |
| --- |
| |
| **Built with β€οΈ for the era of efficient reasoning models.** |
| |
| *Geilim (εΏε») - Cantonese for "cream" - smooth, concise, and rich in substance.* |