Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
98.0
TFLOPS
72
122
261
Asankhaya Sharma
PRO
codelion
Follow
Mike-BM's profile picture
AtakanTekparmak's profile picture
rhslizs's profile picture
374 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
reacted
to
their
post
with 🚀
about 16 hours ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 👍
about 16 hours ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
reacted
to
their
post
with 🔥
about 16 hours ago
Recently, Essential AI released a new 8B base model https://huggingface.co/EssentialAI/rnj-1 they highlighted the importance of data mix for pretraning - "In the long run, we expect our methods to automatically represent, transform, and blend data to optimize measurable abilities in pre-training. Our work on modeling data taxonomies led to new approaches for jointly clustering and mixing data distributions under data repetition penalties. Many improvements in our STEM abilities can be traced back to this. " This resonates with the recent work we did around optimal dataset mixing for pretraining where we saw have the right mix can increase the efficiency of training - https://huggingface.co/blog/codelion/optimal-dataset-mixing
View all activity
Organizations
codelion
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
5 days ago
adaptive-classifier/browsesafe
Text Classification
•
Updated
5 days ago
•
20
•
1
liked
10 datasets
11 days ago
ByteDance-Seed/Code-Contests-Plus
Viewer
•
Updated
Nov 6
•
49.2k
•
6.28k
•
49
codelion/dclm-baseline-100M
Viewer
•
Updated
Nov 2
•
77.2k
•
58
•
2
codelion/finewiki-10M
Viewer
•
Updated
Nov 2
•
4.91k
•
106
•
2
codelion/finepdfs-10M
Viewer
•
Updated
Nov 2
•
7.54k
•
122
•
2
codelion/fineweb-edu-10M
Viewer
•
Updated
Nov 2
•
9.46k
•
233
•
2
codelion/dclm-baseline-10M
Viewer
•
Updated
Nov 2
•
7.95k
•
114
•
2
codelion/finewiki-100M
Viewer
•
Updated
Nov 2
•
68k
•
94
•
2
codelion/finewiki-1B
Viewer
•
Updated
Nov 2
•
52.7k
•
254
•
2
sumukshashidhar-archive/essential-web-v1.0-sample-1B
Viewer
•
Updated
Jul 3
•
1.83M
•
185
•
2
codelion/finepdfs-1B
Viewer
•
Updated
Nov 2
•
186k
•
771
•
2
liked
a model
13 days ago
Qwen/Qwen3-4B-Instruct-2507
Text Generation
•
4B
•
Updated
Sep 17
•
6.29M
•
•
529
liked
a Space
22 days ago
Running
7
Prompt Optimizer
🐨
7
Optimize prompts using OpenEvolve
liked
2 models
24 days ago
deepseek-ai/DeepSeek-Prover-V2-7B
7B
•
Updated
Apr 30
•
46.4k
•
130
adaptive-classifier/chayan
Text Classification
•
Updated
7 days ago
•
50
•
3
liked
a dataset
28 days ago
PleIAs/SYNTH
Viewer
•
Updated
27 days ago
•
68M
•
65.2k
•
194
liked
3 datasets
about 1 month ago
codelion/fineweb-edu-100M
Viewer
•
Updated
Nov 2
•
115k
•
202
•
3
codelion/fineweb-edu-1B
Viewer
•
Updated
Nov 2
•
970k
•
1.55k
•
6
codelion/dclm-baseline-1B
Viewer
•
Updated
Nov 2
•
774k
•
1.3k
•
4
liked
a model
about 1 month ago
patched-codes/Llama-3.2-1B-FixVulns
Updated
Nov 10, 2024
•
2
Load more