Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
98.0
TFLOPS
72
122
261
Asankhaya Sharma
PRO
codelion
Follow
zionray7654's profile picture
Dcas89's profile picture
1Europe's profile picture
378 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
reacted
to
their
post
with 🚀
1 day ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 👍
1 day ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 🔥
1 day ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
View all activity
Organizations
codelion
's models
27
Sort: Recently updated
codelion/gpt-2-70m
Text Generation
•
64.1M
•
Updated
Nov 2
•
964
•
16
codelion/Qwen3-4B-execution-world-model-lora
Text Generation
•
Updated
Oct 20
•
51
•
3
codelion/Qwen2.5-Coder-0.5B-Instruct-security-grpo-lora
Text Generation
•
Updated
Aug 2
•
5
codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
Text Generation
•
Updated
Jul 20
•
22
codelion/Llama-3.2-1B-Instruct-tool-calling-lora
Text Generation
•
Updated
Jul 18
•
101
•
4
codelion/gemma-3-1b-it-reasoning-grpo-lora
Text Generation
•
Updated
Jul 18
•
15
•
5
codelion/Qwen3-0.6B-ICM-DPO
Text Generation
•
0.6B
•
Updated
Jul 18
•
5
codelion/gemma-3-1b-it-ICM-DPO
Text Generation
•
1.0B
•
Updated
Jul 18
•
8
codelion/gemma-3-1b-it-ICM-DPO-mlx-fp16
Text Generation
•
1B
•
Updated
Jul 17
•
18
codelion/Qwen3-0.6B-ICM-DPO-mlx-fp16
Text Generation
•
0.6B
•
Updated
Jul 17
•
37
•
2
codelion/Qwen3-0.6B-accuracy-recovery-lora
Text Generation
•
Updated
Jul 13
•
42
•
4
codelion/Qwen3-0.6B-GRPO-mlx-fp16
Text Generation
•
0.6B
•
Updated
Jul 11
•
15
codelion/Qwen3-0.6B-GRPO
Text Generation
•
0.6B
•
Updated
Jul 11
•
5
codelion/DeepSeek-R1-Distill-Qwen-1.5B-PTS-DPO
Text Generation
•
2B
•
Updated
May 13
•
9
•
2
codelion/Qwen3-0.6B-PTS-DPO
Text Generation
•
0.6B
•
Updated
May 12
•
22
•
1
codelion/Qwen3-0.6B-PTS-DPO-LoRA
Updated
May 7
•
1
codelion/optillm-bert-uncased
0.3B
•
Updated
Feb 16
•
23
•
5
codelion/optillm-modernbert-large
0.4B
•
Updated
Feb 16
•
18
•
9
codelion/Llama-3.3-70B-o1
Text Generation
•
71B
•
Updated
Jan 21
•
64
•
•
2
codelion/Llama-3.3-70B-o1-gguf
71B
•
Updated
Jan 20
•
47
•
1
codelion/Llama-3.3-70B-o1-lora
Updated
Jan 20
•
2
codelion/Llama-3.2-3B-o1
3B
•
Updated
Jan 12
•
72
•
5
codelion/Llama-3.2-3B-o1-lora
Updated
Jan 12
•
4
codelion/MathCoT
8B
•
Updated
Nov 26, 2024
•
41
•
2
codelion/scorelora
Updated
Oct 15, 2024
•
2
•
3
codelion/public-domain-mickey-mouse
Text-to-Image
•
Updated
Jan 5, 2024
•
7
•
•
2
codelion/whisper-age-estimator
Automatic Speech Recognition
•
72.6M
•
Updated
Sep 10, 2023
•
27
•
3