hbfreed commited on
Commit
a13f2c8
·
verified ·
1 Parent(s): 08ca042

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +73 -0
README.md ADDED
@@ -0,0 +1,73 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ license: apache-2.0
5
+ library_name: transformers
6
+ tags:
7
+ - pruned_flex_olmo
8
+ - custom_code
9
+ - math
10
+ - pruned
11
+ - distilled
12
+ - mixture-of-experts
13
+ base_model: allenai/Flex-math-2x7B-1T
14
+ pipeline_tag: text-generation
15
+ ---
16
+
17
+ # flex-math-5504
18
+
19
+ A pruned and distilled variant of [allenai/Flex-math-2x7B-1T](https://huggingface.co/allenai/Flex-math-2x7B-1T) with a variable-width expert MLP. Expert 1 has been pruned from the full 11,008 intermediate size down to **5504** (50% of original width), then recovered via knowledge distillation.
20
+
21
+ | | |
22
+ |---|---|
23
+ | **Total Parameters** | 9.5B |
24
+ | **Expert 1 Parameters** | 2.2B |
25
+ | **Expert 1 Width** | 5504 (50%) |
26
+ | **Base Model** | allenai/Flex-math-2x7B-1T (11.6B params) |
27
+
28
+ For full details, see the [blog post](https://hbfreed.com/2026/01/28/variable-flexolmo.html).
29
+
30
+ ## How to Use
31
+
32
+ This repo includes a `modeling_pruned_flex_olmo.py` file that handles the variable-width expert architecture. Just load with `trust_remote_code=True` and it works like any other HuggingFace model:
33
+
34
+ ```python
35
+ from transformers import AutoModelForCausalLM, AutoTokenizer
36
+
37
+ model = AutoModelForCausalLM.from_pretrained("hbfreed/flex-math-5504", trust_remote_code=True)
38
+ tokenizer = AutoTokenizer.from_pretrained("allenai/Flex-math-2x7B-1T")
39
+
40
+ input_text = "Solve: What is 15% of 200?"
41
+ inputs = tokenizer(input_text, return_tensors="pt")
42
+ outputs = model.generate(**inputs, max_new_tokens=256)
43
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
44
+ ```
45
+
46
+ The tokenizer is the same as the base model's.
47
+
48
+ ## How It Was Made
49
+
50
+ 1. **Structured pruning**: Neuron importance scores were computed on math-specific data (GSM8k, Metamath, TuluMath subsets). The least important neurons in Expert 1's gate/up/down projections were removed, reducing intermediate size from 11,008 to 5504.
51
+ 2. **Knowledge distillation**: The pruned model was retrained for ~228M tokens using the top-128 logprobs from the full-sized teacher model. Distillation data: [hbfreed/flexolmo-math-logprobs](https://huggingface.co/datasets/hbfreed/flexolmo-math-logprobs).
52
+
53
+ Math-calibrated importance analysis was used — 58% of the top-2048 neurons differ between math-calibrated and general-calibrated rankings.
54
+
55
+ ## Benchmark Results
56
+
57
+ | Model | GSM8K | MATH | Math2 |
58
+ |---|---|---|---|
59
+ | No-expert baseline (7.3B) | — | — | 8.1 |
60
+ | **flex-math-5504** | **66.6** | **26.8** | **46.7** |
61
+ | Full teacher (11.6B) | 69.7 | 35.4 | 52.5 |
62
+
63
+ ### All Variants
64
+
65
+ | Model | Total Params | Expert Width | GSM8K | MATH | Math2 |
66
+ |---|---|---|---|---|---|
67
+ | [flex-math-8192](https://huggingface.co/hbfreed/flex-math-8192) | 10.5B | 8192 (74%) | 70.1 | 31.3 | 50.7 |
68
+ | [flex-math-5504](https://huggingface.co/hbfreed/flex-math-5504) | 9.5B | 5504 (50%) | 66.6 | 26.8 | 46.7 |
69
+ | [flex-math-2048](https://huggingface.co/hbfreed/flex-math-2048) | 8.1B | 2048 (19%) | 44.3 | 13.9 | 29.1 |
70
+
71
+ ## License
72
+
73
+ Apache 2.0 (same as base model)